Tape in the Cloud

AddThis Social Bookmark Button


by Chris Marsh

This article marks the first in a three-part series looking at the role of tape in the cloud, data integrity verification in the cloud, and archiving in the cloud. Click here to read the second article in this series.

A surprisingly high number of people believe that Cloud Storage is synonymous with disk, primarily performance disk, or Tier-1 disk, and deduplication. This is simply not the case, and given this misconception, it is understandable that the recent revelation that Google had to pull Gmail account data off of tapes was met with both controversy and surprise. Regardless of perceptions, however, tape continues to play a vital role in cloud storage, cloud backup, and cloud archiving.  To understand why this is the case, it is important to first understand the critical hurdles that tape allows a cloud service provider to overcome in an online high availability environment, such as the cloud.

The term, “Cloud”, has become such a household word that it is often difficult to distinguish which particular aspect or type of cloud service is being discussed when the term is used. Cloud Storage is used in this instance as an inclusive term; including private, hybrid,  and public cloud architectures; as well as whether the storage is being used for primary, archival, or backup storage. Essentially, ‘Cloud’ refers to the practice of storing and accessing company data across a wide and sometimes public network, regardless of whether that data is managed internally (Private Cloud) or by a third party (Public Cloud).  Cloud services are a cost model regardless of implementation. For a company looking at the cloud as a service, it provides flexibility in both capacities and performance without the need for large up front capital expenses. For Cloud Service Providers, systems are built around flexibility ensuring the proper level of protection and service, or SLAs, while maximizing profitability.

When discussing the cloud, there are many characteristics that should be discussed, including: multi-tenancy, security, data integrity verification, retrieval expectations, and exit strategies. In this series, all of these concepts will be addressed as they pertain to tape.  Non-repudiation, which is the ability to guarantee that all copies of a set of data have been deleted, will not be addressed.  With a little consideration on the concept, it should become rapidly clear that non-repudiation is a virtually impossible standard to ensure with any online or even digital environment, because most systems are designed to easily create copies, and not to limit or ensure the deletion of them.

Multi-tenancy--ensuring that customers’ data is kept both separate and inaccessible by anyone who does not have access privileges--is at the very heart of cloud storage. To an individual utilizing the cloud for pictures or personal backups, this might not be a necessity. However, for any organization looking to send legal or sensitive information into the cloud, it is critical that that data is kept secure, and inaccessible by any other party, whether by accident, or through malicious attacks.

Multi-tenancy is often managed both at the hardware and software level, regardless of the tier of storage on which the data resides. However, from a hardware perspective, this can be much more subtly difficult than it might seem. The very features created to both protect and condense data can inadvertently complicate things. Deduplication, RAID and thin-provisioning are all examples of great features on disk that allow cost savings and/or protection. Unfortunately these techniques often involve breaking up data at the block level and redistributing it, or deleting it based on meta-data tables. The implication of this is that it can be extremely difficult to identify exactly where a given file resides within a storage unit. A solid example of file fragmentation with a single hard drive can be shown simply by running a disk defragmenter on your own computer.  Many systems have to bypass their own intelligence in order to ensure that a customer’s data is kept isolated from every other customer.

On tape, however, this is a much simpler issue to address. Each LTO cartridge is a physically separate object that can be written to and read from. While compression adds to the capacity of a cartridge, ultimately the end user or service provider have complete control over what data ends up on any given tape, providing a natural separation. For customers requiring a further level of separation, many libraries can divide themselves into hard partitions, presenting themselves to the network as multiple, individual libraries, with no inter-accessibility. As such, tape is a perfect medium for any data that isn’t being constantly accessed and modified.

Tape provides another technical differentiator from disk systems that addresses both security and data protection concerns: a completely offline copy of data. In the age of smart phones, Wi-Fi, satellite and other online technologies, it’s easy to understand how the relevancy of an offline copy can get diluted. However, the fact remains that any data that is online is susceptible to data corruption, malicious attacks, and even accidental deletion. Why? Online data, by definition, must be readily accessible. Data protection is always a series of calculated risks, and even such technologies as snapshots, continuous data protection (CDP), replication, and RAID are not absolute guarantees of security. If a file is corrupted by any means, the corruption can be replicated, mirrored, and backed up. The Google Gmail bug of February 2011 demonstrates that even a software glitch can have a catastrophic impact on any online data.

Fortunately, tape is removable, and there are no means for contamination within a system to affect a tape that is not in a library. Therefore as long as at least minimal best practices for tape storage are adhered to, tape remains the silent defender of data at the back end of any intelligently designed storage system. Another advantage of tape is that extra copies are much less expensive to create and maintain than comparable disk systems, ensuring that multiple copies of data can be stored in multiple locations and a secondary copy can restore the data, without data loss.

Tape also provides a capability essential to cloud storage that isn’t often discussed: an exit strategy. In the event that a customer with large amounts of data needs to exit the cloud, or retrieve their data beyond the capabilities of the available bandwidth; removable media can be shipped back to a customer without loss of data. Also, media can be physically moved to another location, providing a clean movement of data without the necessity of extensive bandwidth between the service provider and customer.

While I would love to see every cloud service provider out evangelizing about the merits of tape storage; the fact remains that they don’t and the misconception that tape is not used in the cloud remains. For decades the storage disk providers have been professing the death of tape. Why? Tape threatens their disk sales. Tape’s affordability, transportability, density and power efficiency help keep costs in check. 

Cloud storage providers often implement a tiered storage approach that provides upfront performance to customers via performance disk, while relying on tape storage on the back end. This is an effective way of meeting their customers’ requirements and driving more profitability for their organization. So, regardless of whether it’s a business or technology decision, tape adds value to cloud storage and should be considered when reviewing any specific cloud storage service provider.

Chris Marsh is the IT market and development manager at Spectra Logic (Boulder, CO). www.spectralogic.com