As a tidal wave of data washes over the enterprise, IT departments are under increasing pressure to control rising costs. Most tend to bail out in the conventional way—they add more storage. The result is a generic infrastructure that is not optimized for any specific type of data or application, but is instead broadly suited to the highest common denominator. And the consequences are costly. If an organization has 10TB of storage with a cost of $200,000 and experiences 100 percent growth, their storage-related purchases will soar from $200,000 in the first year to $400,000 in the second, $800,000 in the third, etc.
An IT executive at a large financial services company described the dilemma this way: “As the environment grew, we needed to add more servers, more engineers, and more network components to support it, and we were spending money and time to backup information we didn’t have to backup.”
Clearly, the old ways of managing data are untenable in the face of these new challenges. For this company, and many others, an automated tiering solution is the answer.
The first step toward addressing the problem is recognizing that the majority of data being stored is not business critical and does not warrant the performance, availability or cost of premier storage. For example, as data ages it is generally less relevant to the business. In most organizations around 70 percent of files have not changed in 3 months or more. Large amounts of space are often devoted to e-mail archive files, digital media (e.g. personal music libraries) or other types of data which are not business critical.
If not all data has equal relevance to the business then not all data need reside on the same type, or “tier”, of storage. Different types of data may be stored on different storage tiers that represent different performance, availability or cost points, enabling IT to build a very cost-effective storage infrastructure. If the organization described previously purchased a different class of storage at 50 percent lower cost and was able to move 70 percent of their data to this second storage tier, their storage-related purchases would drop from $200,000 in the first year to $140,000, and from $400,000 in the second year to only $180,000. Further savings might be realized by adding additional tiers.
The challenge in being able to realize these cost savings lies in the ability to do the following:
- Identify different types of data
- Place them on appropriate tiers of storage
- Manage this relationship over time
- Do this all without impacting client access to the data and without increasing management costs
File virtualization possesses these abilities. File virtualization decouples client access to files from the physical location of the files. Clients are no longer required to possess explicit knowledge of the physical location of the data. In fact, clients are not aware that data is moving between different storage tiers and their access to data is never disrupted.
File virtualization solutions classify data based on flexible criteria such as age or type, place the data on the correct storage tier and automate the movement of data throughout its lifecycle based on policies. For example, a company might create a policy that says
- All new files are placed on tier 1.
- Files that have not changed in 90 days are moved to tier 2.
- Files that have not changed in 1 year are moved to tier 3.
- If any file that has been moved to tier 2 or 3 changes, return it to tier 1.
Along with rising costs, growth increases backup pain. As the amount of data escalates so too does the length of time it takes to complete a backup. When backups run the risk of exceeding allocated backup windows, IT departments are faced with failing to meet service level agreements to the business or, worse yet, not adequately protecting the organization’s data.
The fact is much of the data being backed up is the same data that has been captured in previous backups. Since only a small fraction of data is actually changing, the vast majority does not need to be continually backed up. Non business-critical content like personal music libraries or e-mail archives may not need to be backed up either. The bottom line is that many companies are spending time and money backing up information they do not need to.
Once again, automated tiering with file virtualization offers a solution. File virtualization is able to distinguish data that is not changing and move it to a storage tier that is backed up less frequently, or not at all. Should the data change at some point in the future, it is automatically returned to the original storage tier for backup on the regular schedule. This dramatically reduces the amount of redundant data that is backed up on a regular basis and condenses the time taken to do backups. Recently a large media company used a file virtualization solution to move data that had not changed in a month out of the primary backup data set and saw their backup times drop from 36 hours to less than one hour.
Reducing the amount of data in the primary backup data set also cuts the costs associated with the backup infrastructure (which usually requires far more capacity than the primary data). Less data to backup means less tape, fewer tape libraries, less virtual tape, lower licensing costs, and reduced fees associated with offsite storage.
File virtualization enables an automated tiering solution that is seamless to clients and provides dramatic cost and efficiency benefits. Here are several elements to consider when looking for a file virtualization solution:
- The ideal solution will work with the storage environment you have today as well as providing flexible options for the future. The solution should not lock you into a specific storage technology or force you to change the infrastructure you already have.
- Look for a solution that will meet not only your current scale needs, but accommodate your future growth. Solutions that require links or stub files can be difficult to remove, and often come with performance penalties and scale limitations.
- A solution that manages information at the file level rather than the directory level will give you greater flexibility and provide the greatest business value.
- The most effective tiering solutions will manage data in real-time. Placing a file on the right storage device when it’s created is more effective than having to go and search for it and move it after-the-fact.
With file virtualization the classification, placement and movement of data across different tiers of storage throughout its lifecycle, is automated and takes place without disruption to users and applications. This significantly reduces the cost associated with storage and backup and shrinks backup times, dramatically improving IT’s ability to deal with storage growth.
Nigel Burmeister is director of Product Marketing for F5 Networks.