Home CTR Exclusives A New RAID for Cloud-Scale Data Integrity

A New RAID for Cloud-Scale Data Integrity

The challenges associated with scaling data storage solutions are now applicable to IT professionals in all industries, not just a few niche, data-intensive applications. The scaling problem typically results in either sub-linear scaling, data corruption or data loss, because as you increase the population of hard drives, the probability of having a failure increases.

An Overview of RAID

One of the most common tools in the IT professional’s arsenal for managing large data sets is RAID. RAID is the encoding of data across a group of multiple hard drives managed as a single set, such that you don’t lose data when a single disk fails. Different RAID implementations have different degrees of redundancy and performance characteristics, based on how the data is laid out.

RAID-5

Figure 1 is an example of a sample RAID-5 layout of 4 disks. RAID-5 is an “N+1” architecture. This is a 4+1 layout, meaning that 4 disks worth of data are encoded into 5 disks, such that as long as 4 disks are available, the data remains available and recoverable.


Figure 1: RAID-5, 4+1 configuration protects against data loss through the failure of any single disk.

In this example, if disk 5 were to fail, logical blocks 8, 12, 16, and 20 would still be available by reading the entire remaining stripe, and using the parity calculation to retrieve the value.

RAID-5 has one major non-obvious limitation. It has no defense against “bit rot,” or “silent data corruption.” Silent data corruption is the case where the computer issues a read command to the hard drive and the hard drive returns data that is different from what was originally written, i.e., the data is corrupt, yet no errors are raised by the disk, either prior to or during the read. The lack of an error condition is why it is called “silent” data corruption.

These corruption events can be transient (meaning you can re-read the block and it will be OK) or can indicate permanent data corruption. Hard drive manufacturers express these corruption events as a Bit Error Rate, and helpfully publish specifications, typically 1 in 1014 operations for consumer level drives and 1 in 1015 operations for enterprise drives. Causes leading to a nonzero bit error rate include: a) misdirected writes, where the hard drive writes the data correctly, but in the wrong place (now two sectors are corrupt!) b) torn writes, where the hard drive doesn’t finish a write (for example it lost power while writing) c) non-writes where the hard drive says it wrote data, but didn’t.

It’s worth noting that RAID-5, in the absence of other data integrity techniques, is vulnerable both in the non-degraded (i.e., all the drives are online and working) and especially vulnerable in the degraded state, whereby the bit error will corrupt the data rebuilt on the replacement drive, where it can lay dormant and undiscovered until the file is accessed.

RAID-6

Newer RAID-6 technologies are an “N+2” implementation. This means that RAID-6 can tolerate two simultaneous device failures.

The most important potential benefit for RAID-6 adoption, however, has been largely ignored. In the normal operating state, RAID-6 has the opportunity to provide a single level of protection against the bit error rate problem. In a nutshell, there is the data hard drive, and two parity hard drives, resulting in three “votes.” In the event that one of the drives returns bad data, the other two can “vote,” with the majority accepted as the correct data, allowing the system to both detect and correct the error. One downside of this implementation is that in order for the system to have this data corruption protection, it must always operate on the entire stripe at a time, either for reads or for writes, and most RAID-6 implementations do not.

There are multiple RAID-6 implementations, but Figure 2 is one example in an 8+2 (eight data and two parity) configuration.


Figure 2: RAID-6 with 2 parities (in this example, 8+2) allows data to be reconstructed even in the event of 2 simultaneous failures.

RAIN in the Clouds

To support massive scale over extended timeframes, though, something even better than RAID-6 is needed.

A RAIN architecture that  redundantly encodes across nodes (i.e. “computers”), not just disks, can tolerate not just a hard drive failure or bit error, but also complete node failure or bit error, from any cause, whether network, power supply, memory or a hard drive. This node striping not only improves data integrity, but also improves system availability and the overall scalability of the system. +

Increasing the parity protection to an N+3 implementation allows it to tolerate three simultaneous failures (whether a single disk or an entire node) in a given stripe and still provide for both availability and data integrity.


Figure 3: RAIN-6 example in an 8+3 configuration has 3 parities that insure data availability and accessibility even in the event of 3 simultaneous failures of any kind including drives, network links or even entire
storage nodes.

An example would be an 8+3 encoding where 8 pieces of primary data are encoded into 11 chunks, and those 11 chunks are then distributed among 11 independent storage nodes.

Addressing the bit error rate issues

A technique used by enterprise storage vendors to address the bit error rate weakness is a calculation called a cyclic redundancy check, or CRC. A CRC is a small “fingerprint” of data that can identify bit errors or corruptions in a larger set of data. By using the CRC, the storage system can compare a set of data to its CRC before allowing it to vote. Generally only enterprise level fiber channel and SAS drives support CRCs.

A cloud storage provider with enterprise integrity aims will implement a robust CRC with verification at read time as well as automated background data checking and correction. As an IT professional, I know that best practice is to “trust but verify,” meaning I want the ability to verify that my data is available and free from corruption independently of the vendor’s opinion. Having the cloud provider issue a write receipt with the data encoding would allow customers to independently verify and calculate that the data returned is free from corruption prior to relying on it.

Data Integrity in the Cloud

The features I’ve outlined here, RAIN-6, CRCs, continuous scrubbing, and file-level hashes, all combine to provide better data availability, integrity and protection than is possible with even the most expensive on-premise enterprise storage systems. This forms a baseline requirement for Cloud Storage providers with enterprise data integrity aspirations.

Jeff Whitehead ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it ) is vice president of Technical Operations and CTO, and Zetta co-founder. Twitter: @jwhitehead

 

Computer Technology News
  See current issue or subscribe below

Subscribe to CTN