Cloud Computing
By Anand Babu (AB) Periasamy
In the initial wave of server virtualization the focus was primarily on consolidation and efficiency. The first applications to move to virtual machines (VMs) were either in development environments or lightweight workloads, and were mostly static and not very I/O intensive such as DHCP, DNS and Active Directory. Servers were primarily using block-based Storage Area Networks (SAN) for network disk storage and that initially did not change.
As the use of server virtualization increased it began to strain the supporting SAN infrastructure. Every VM required a dedicated Logical Unit Number (LUN) to be provisioned, and as the number of VMs exploded it created management and scalability issues because of the growing number of individual LUNs. VMware developed a file system (VMFS) specifically to address the issues with SAN storage for virtual machines, enabling the creation of larger (but still limited) LUNs with the ability to store data from multiple VMs. This is an early example of storage virtualization and highlights the challenge that continues today of storage playing catch-up with virtualization advances on the server side.
As server virtualization technology has matured, we have now reached a state where enterprise applications are running in VMs in an extremely dynamic manner. Along with continued management complexity, addressing I/O bottlenecks and containing the cost of storage to accompany virtual servers has emerged as the top issues. Network Attached Storage (NAS) and iSCSI are now viable solutions for VM storage. iSCSI offers ease of use and cost advantages over traditional Fibre Channel (FC) SAN, but suffers from many common issues. NAS offers similar cost and simplicity advantages, but the inherent scalability and sharing capabilities position NAS to be the storage platform of choice for the cloud. Data center managers are now moving full speed to virtualize everything and adopt the cloud model, which makes it even more important to explore how NAS can efficiently enable cloud storage.
Limitations of SAN
Scalability limitations SAN is block storage; however, a scalable file system is needed to manage the data within the SAN. With VMware, VMFS allows virtual disks to be stored as files. Maximum volume size is 64TB with virtual disks up to 2TB each, although I/O will most likely be a bottleneck before these limits are reached. When it comes to application generated data the situation is worse. Without a global namespace, SAN has no easy solution to manage the hundreds of terabytes, or even petabytes, of application data created by cloud-based applications. Without a scalable high-performance storage architecture cloud computing will stall if reliant on SAN.
Manageability challenges In a relatively static environment, provisioning and managing LUNs is a reasonable task. A cloud environment where hundreds of VMs can be provisioned in minutes is extremely dynamic. The task of provisioning, backup and recovery are extremely complex for a large number of LUNs. SAN administration requires significant expertise such as the need to use multiple extents to create a VMFS volume larger than 2TB, or implementing raw device mappings (RDM) for performance. Given that cloud benefits can only be achieved with large scale automation, this type of complexity can be a significant inhibitor.
Data Sharing LUNs have to be concurrently accessed across physical and virtual machines for shared hosting of virtual disks and application data. Even for VMware, VMFS only addresses the shared virtual disk storage requirement. Other VM options like Hyper-V, Xen or KVM lack a VMFS equivalent. Cloud applications require shared access to the data partition for clustering, and SAN does not offer a sharing functionality by itself, thereby falling short.
Cost Cloud architects consider $0.50 per GB expensive storage. Using iSCSI is a cheaper alternative to FC SAN, but still not low enough to justify the economics for cloud storage. While SAN was designed for mission critical database environments with full redundancy at the hardware level, it is unfair to expect otherwise. Logically, one must address storage as a software problem using commodity scale-out architecture, similar to Google and Amazon.
Scalable NAS file systems can use SAN or Direct Attached Storage (DAS) for the disk back-end in a complementary way and overcome several of these problems. SAN brings mission critical availability and the benefits of less disk usage for power, cooling and real estate advantages. DAS-based cloud storage does require data to be replicated twice (or more), but it is primarily a software-based solution that is cost effective and flexible. The two main factors are cost and expertise.
NAS Advantages
NAS can Scale to Petabytes Unstructured data (files and folders) is exploding. It is predicted that by the end of 2010, 1200 exabytes of data will be created. NAS can scale seamlessly whether you require a thousand virtual disk images, a petabyte of data, or both. A scalable NAS solution with a unified global namespace that automatically load balances data across multiple storage servers is ideal for such an environment.
Shared Data Access NAS volumes can be mounted simultaneously across thousands of servers, providing both the hypervisor and cloud applications to share the same storage. This allows VM migration across a large pool of servers without concerns for storage access. Sharing is more crucial for application data as nearly all cloud applications require the data partition to be shared. NAS enables multiple VMs to access data concurrently to distribute I/O and allows simultaneous read/write access to files. Even a simple setup of VMs serving HTML and image files require a shared volume. Separate NAS volumes are required only for multitenancy – partitioning different groups of applications and users – and fewer volumes result in far less administrative tasks.
Cost and Ease of Use Compared to FC SAN, NAS systems using protocols such as NFS and CIFS are simpler to deploy and use. VM disk images are simply files on a NAS volume and backup/recovery is a familiar process. Scalable NAS solutions with software redundancy can start as low as $8,000 for 24TB of storage.
High Performance NAS performance on 1GigE network was sufficient for most users and now NAS supports 10GigE and 40G Infiniband networking, which is faster than 8G FC. In addition, NAS scales across multiple storage servers to deliver hundreds of gigabits throughput at 10µs latency. SAN vendors are themselves favoring Ethernet over FC today.
Replication and thin provisioning are important additional features. Scalable NAS solutions support built-in mirroring to survive hardware failure, and replication with snapshot/cloning can support disaster recovery. Thin provisioning allows on-demand allocation of disk capacity, resulting in efficient utilization of disk space. With NAS, thin provisioning comes by default and it is easy to expand NAS volumes on-the-fly.
Modern NAS Protocols
After a decade of relative quiet, NAS is currently experiencing a burst of innovation. NAS file systems such as GlusterFS, Isilon OneFS, IBM SONAS and Panasas PanFS allows NFS, CIFS and direct-access NAS protocols to scale to very large capacity and high performance. Since they are POSIX compliant, applications can run unmodified. There is also a new class of storage protocols broadly classified under NoSQL. These falls between the NAS and SQL models and are mostly based on key-value-pair or distributed-object-storage APIs. In order to take advantage of these APIs application modification is required, but can be worth the effort. Examples include Apache Hadoop, Apache Cassandra and Amazon S3.
There are multiple reasons why NAS is destined to play a leading role for cloud storage. The explosion of unstructured data can only be handled by NAS technology. Infiniband and 10GbE has taken NAS ahead of SAN in performance and scalability. Compute and storage networking will converge with the emerging Ethernet DCB standard. While SAN must adopt a new Ethernet iSCSI/FCoE based protocol, NAS will naturally enjoy the TCP/IP and RDMA capabilities. SAN will complement NAS as back-end storage and carve its own share for mission critical storage, but at the end of the day cloud will drive commoditization and the move to NAS.
Anand Babu (AB) Periasamy is co-founder and CTO at Gluster.

