Comments on VMDK formats

To sum up part of the work in GSoC 2011, this post talks about some facts of real-world VMDK format, which is not quite well documented in the official Specification Virtual Disk Format 1.1 (version may be changed from the time this was written).

Descriptor File Recognition Criteria

In the specification it is not clearly defined what information is required for image reader program to recognize the file as a VMDK image. The descriptor file has no magic bytes as a text file, the first lines are consequently important for recognizing. When you want to edit or generate from scratch a VMDK descriptor file, don’t hesitate to follow the structure of the sample in the specification, otherwise there are at least first two lines shouldn’t be moved or mutated, because the first non-comment line “version=1″ is what makes a de facto recognition criteria according to VMware Workstation. You can find the detail in this and this post.

ESX Server Sparse Extent and VMDK3

ESX Server Sparse Extent is a briefly specified format in the specificaion. It is said to be a sparse extent type used in ESX servers, but it is definitely not appearing in recent ESX applications. On the other hand the format is very close to the obsolete VMDK3 sparse extent (somehow indistinguishable).

To be short, ESX Server Sparse Extent is (no longer) a real world VMDK format type, it might be an alias of old VMDK3 (which is only used before VMware workstation 4.0 and no longer supported in VMware Workstation 5 and newer versions), and it is not seen anywhere else except the specification. However there are difference between VMDK3 and ESX Server Sparse Extent, it’s discussed in details here

About createTypes

There are a dozen of VMDK create types according to the specification, but only half of them are commonly used. MonolithicSparse, monolithicFlat, twoGbMaxExtentSparse and twoGbMaxExtentFlat are four brothers that are produced according to two option flags: “split into 2GB files” and “allocate all space on creating”. Besides these, vmfs is widely used in ESX servers, whose format detail is not open so we don’t have much to dig in. StreamOptimized is a format for exporting ESX server VM’s, this will be discuzed later.

Header field: overHead

OverHead is a field in SparseExtentHeader, it’s explaination as “overHead is the number of sectors occupied by the metadata”. Here the word “metadata” means everything other than data grains, including header and grain directory/tables, which are all allocated once the image is created. As a result, overHead is also the offset of the first data grain. This is an implication and should be taken care of.

About compress and streamOptimized

Compression is an optional feature in sparse extents, the compressing algorithm is RFC 1951 (Deflate). Data is only compressed in streamOptimized images, which also has grainMarker feature enable. Compression is not used alone, if we create a monolithicSparse with header.compressAlgorithm set and write some compressed data, VMware won’t read the correct data, it simply read the compressed bytes out.

StreamOptimized is used mainly for the purpose of transferring, although it can be attached to VM in VMware Workstation and read data normally, writing is limited to the unallocated offsets, any overwriting to existing disk sectors are silently discarded. This would easily lead to inconsistence, so I think the only reasonable use case of streamOptimized is to export some local image to this type for upload and download, once the transmission is over, it should be converted back to host disk types (such as monolithicSparse or similar), before booting the guest system.

Special Images

There are image cases not strictly following the specification, but found to working with VMware. One of those is Haiku VMDK image (VM mirrors at http://haiku-os.org/get-haiku). It is composed of Host Sparse Extent Header + Descriptor file + Flat data. Where header fields capacity, gdOffset and rgdOffset are zero, and the following descriptor specifies the extent to be FLAT pointing to the file itself with offset 128 and the data start from offset 0×10000. Using sparse header with flat data is uncommon, and obviously beyond definition of the specification. However both VMware and VirtualBox support such case. Furthermore, as it is tested, the vmdk file name must be the same with the one in descriptor for VMware to be correctly attached, just like how descriptor file + separate flat extent work. There was also a QEMU bug report with this.

vmware-mount

VMware VDDK contains a tool ‘vmware-mount’ that is capable of mounting a partition in VMDK disk on the host file system. It uses FUSE to achieve this. The internal is introduced here.

Virt Block