By Mike Ivanov
Server virtualization as popularized by VMware, Microsoft, and others is a widely used tactic for reducing data center costs. By reducing the number of physical servers, data centers have been able to significantly reduce data center footprints and the costs of server acquisitions, energy, management, and more. While server-related costs have decreased, the corresponding storage costs have not — in fact, storage costs are rising rapidly as a result of server virtualization.
At first glance it is not obvious why storage costs increased, given that cost per GB has consistently fallen. The answer lies in the management of virtualized servers. This article reviews the popular VMware vSphere Hypervisor (ESXi) and how its total storage consumption and corresponding costs can be dramatically reduced by data deduplication.
VMware Storage
A VMware ESXi virtual machine uses a virtual disk (VMDK) to store its operating system, program files, and other data associated with its activities (Figure one.). The VMDK is a large physical file, or set of files, that can be moved, deleted, and copied as easily as any other file. To store and manage virtual disks, VMware vSphere uses its own special storage space called a VMFS datastore, which is similar to a file system on a logical volume. A VMFS datastore can be created on a wide variety of physical storage devices, including internal and external storage or networked storage devices.

VMware Storage Sprawl
VMware storage consumption is the number one issue for storage administrators due to its high consumption and associated costs. Virtual machine “clones” are easy to create, and VMware storage sprawl is a direct result. Because the need for additional physical server deployment is significantly reduced, VMware administrators now create additional virtual machines for patches, bug fixes, operating service packs, and other internal departments (e.g., engineering, support, and marketing). Depending on the storage management practices of VMware virtual servers, the amount of disk space needed for all virtual machines and their support files can quickly become staggering.
Primary Storage Data Deduplication Software
Implementing primary storage data deduplication software is the answer to VMware storage sprawl. Data deduplication software identifies duplicate chunks of data so that each unique chunk is stored only once. When applied to VMFS datastores, the amount of disk space necessary to store virtual machines can be reduced significantly. Virtual machines are ideal candidates for data reduction because they so often contain identical operating systems and applications. As a result, the actual differences between files are quite small and lend themselves to significant data reduction via deduplication technology.
Figure 2 illustrates this basic storage reduction technique. In Figure 2 three virtual machines with the same operating system and application software are shown, each with its own virtual disk storage. When data deduplication is applied, three virtual disks are reduced to one. It is clear that virtual server storage can benefit by storage deduplication. Storage deduplication also benefits virtual machine performance. Virtual machine memory blocks can be deduped in cache so that each virtual machine runs faster with reduced disk access.
Conclusion
Primary storage data deduplication is a valuable technology for reducing the storage costs associated with virtual servers. By eliminating the duplicate data for each VMDK, VMware administrators can deploy virtual machines as needed without consuming massive amounts of disk storage. Old VMware images can be stored on disk volumes with deduplication to further save on disk space. Multiple storage vendors offer data deduplication for primary storage and should be considered for reducing VMware management costs.
Mike Ivanov is the Vice President of Marketing at Permabit Technology Corporation (http://www.permabit.com)

