Making Multi-Level Cell (MLC) Flash Practical for Enterprise Solid State Drives (SSDs)

AddThis Social Bookmark Button

SandForceSolid state drives (SSDs) promise higher read/write performance, lower power consumption for a given workload, and higher reliability when compared with hard disk drives. As a result, we have seen modest usage in the enterprise, but broad-based SSD adoption won’t take place until manufacturers can use more cost-effective multi-level cell (MLC) flash memory. To date, the use of MLC has been held back by real-world issues with performance and reliability. For example, current “high performance” MLC-based SSDs provide non-sequential write performance that is sometimes slower than hard disk drives.

In this article, we will look at the key technical issues that are preventing SSDs from broad-based adoption by enterprise server and laptop makers, including endurance, reliability, performance and security, and show how a new class of SSD processor can overcome them.

Key barriers to success

The fundamental barriers to widespread use of NAND flash-based SSDs in the enterprise are cost, endurance, write performance, and reliability.

Cost: Using standard SSD controllers, SSD makers are forced to use SLC flash because MLC flash endurance, write performance, and reliability are inadequate. But SLC flash costs 3-4 times more, making drives too expensive for widespread use. The solution to the cost problem is to use MLC flash, but this requires overcoming the barriers of endurance, write performance, and reliability.

Endurance: Flash memory supports only so many write cycles, and the write endurance for MLC flash is much lower than for SLC flash. SSDs using MLC flash with standard controllers in an enterprise environment could easily wear out in just 23 days of use – far too short to be useful. Manufacturers work around this limitation by over-provisioning the SSDs, keeping “spare” flash memory on hand to replace those as they wear out. They also limit warranty coverage to a few years or less and impose warranty limits on the amount of data written to the drive per day.

Write Performance: Today’s SSDs have write performance that is only one-tenth that of their read performance. This is caused by a combination of the controller limiting the number of writes to the drive (to artificially extend endurance) and a high write amplification factor. Write amplification is a measurement of additional program/erase cycles that must be executed on the flash interface beyond the original data written from the host. Write amplification occurs in flash-based drives because in order to write to the flash memory, it must first be erased. These erasures occur in large blocks of data, not individual bytes. Write amplification also accounts for any wear-leveling inefficiency that causes the drive to write even more to the flash. For standard controllers, a write amplification factor of 40 is not uncommon in the randomly written data that is typical of enterprise workloads.

Reliability: The level of ECC error protection available on SSDs today (<1 unrecoverable sector per 1015 bits read) is less than that of top-performing HDDs. With the higher data read rates of SSDs, a higher level of protection is required to provide the same reliability, yet they do not provide it. The unrecoverable errors and complete die failures of MLC flash are more than 1000 parts per million (PPM) and getting worse. In a fully configured SSD (assuming 128 flash die are being used), that failure rate could easily reach 12 percent over the life of the drive. SSD manufacturers expect system-level RAID protection to manage these errors, but RAID rebuilds take a heavy toll on performance in some server environments, and certainly notebook computers don’t have that redundancy option. The real answer is to make the drives themselves more reliable.

The SSD Processor

SSD processors are a new type of flash controller that enables the use of MLC flash by overcoming the key technical barriers outlined above. With an SSD processor, endurance, write performance, and reliability are each optimized to their full potential, providing a superior SSD using commodity flash memory.

Endurance: Standard SSD controllers have high write amplification factors. SSD processors have the data reduction capability to reduce the number of writes by 80 times or more over these standard controllers. Thus, by using an SSD processor rather than a typical SSD controller, designers can create an MLC flash-based SSD that can operate in a typical enterprise environment for 5 years with no daily write restrictions.

Write Performance: Much like endurance, write performance problems are largely caused by write amplification, but other inefficiencies in standard SSD controllers can also slow it down. With dedicated processing power, an SSD processor can optimize the write operation to the flash memory so the write performance of an SSD matches the read performance. There is no need to artificially restrict the write performance to enable the endurance to meet the 5-year life requirement. (See Figure 1.)

Sustained IOPS for Random Reads and Writes in standard SSDs compared to SSDs using SSD processors

Figure 1: Sustained IOPS for Random Reads and Writes in standard SSDs compared to SSDs using SSD processors.

Reliability: Using an SSD processor also improves reliability because it allows designers to implement a RAID-like capability using just a single drive. By enabling RAID-like capabilities that span multiple die, SSD processors can allow for recovery of previously unrecoverable errors or even a total die failure. This lowers the failure rate of a high-capacity enterprise SSD by 100 times to only 0.13 percent. The SSD processor technology could also provide a significantly higher level of ECC protection, enabling  an SSD with a read error rate of less than 1 sector error per 1017 bits read. (See Figure 2.)

Time between sector errors based on the same usage model for different levels of ECC protection and drive types.

Figure 2: Time between sector errors based on the same usage model for different levels of ECC protection and drive types.

By using MLC flash, SSD makers can create drives that drop the underlying flash memory cost by nearly 75 percent, enabling widespread use in servers and laptops.  SSD processors make it practical to use MLC flash by making it faster, more reliable, and longer-lasting than SLC-based drives with traditional controllers.

Thad Omura is vice president of marketing for SandForce.