by Bob Roudebush
Do you have a disaster recovery (DR) plan in place for 2011? If so, when is the last time you revisited it? Even if you do have a DR plan, there is always room for improvement. If you don’t have a plan, what will you do if a disaster – large or small – strikes?
When you are considering (or reconsidering) your DR plan, there are four key criteria you need to keep in mind:
1. Understand what you are planning for. Do you want all your data back or all your services back? Many DR plans only include steps to get the data back, when you really need your services up and running.
2. Understand what the acceptable downtime is for each application. If possible, get this documented and agreed to throughout your organization.
3. Understand how long the DR process will take and all the steps it will take to return an application to service. Organizations often skip this step, or only look at part of the process. There might be a lot more steps – and time – required than you originally thought.
4. Start small. Plan for the situations that will happen soonest and most often, not for the ones that will grab the most headlines.
One of the most important first steps is to define a disaster. Of course, there are your major disasters, which grab news headlines, such as fires, floods, hurricanes and earthquakes. Often these major disasters are the focus of DR projects so you can recover your data and keep your business up and running if the worst were to happen.
Minor disasters, though, are far more likely to happen. While major disasters might occur once a decade – or even less frequently – minor disasters, such as server outages, storage failures and application outages, occur as often as once a month. These events seem “too small to plan for.” But reality is that these minor disasters are actually more costly to your business than major disasters are.
These events aren’t just annoyances; they can severely impact user productivity, delaying production delivery and revenue recognition. These events impact customers’ experiences and brand quality, which leads to customers getting frustrated and taking their business elsewhere.
When DR plans are formulated, planning for these minor disasters is what is missing most from the process. As you’re preparing for 2011, it’s important to take these “minor” events – which can actually cause major damage – into account.
How to Prepare for These Disasters
There are several products available to help protect your company from disaster. But which is the best for your business? Often, when companies think about how to best protect themselves, the obvious answer is to implement tape or disk – based backup software. But is this enough?
When evaluating the plethora of options available for periodic backup software, you need to consider reliability and whether the data will back up smoothly. Consider the time it will take you to move the data back into place. This step is hardly ever calculated accurately – in fact, the process is usually slower than the backup itself – especially if incremental backups must be appended to a ‘full’ backup.
You also need to consider how much time it will take to get applications back to their regular level of operation, which is typically a lengthy process. Unfortunately, with the recovery-centric nature of data backup, system administrators are often left scrambling and must be extremely creative to bring data back together with a running application. While backup certainly has its place in DR plans, the complexity of the recovery process may not deliver a Recovery Time Objective (RTO) that falls within the Service Level Agreements that your organization has in place. Tape or disk – based backup software, media and hardware are ideal for long-term archival of data and can help recovery to previous points in time after logical errors or corruption but the availability of critical applications with an RTO expectation of minutes can’t be protected by traditional backup alone.
Often, a better way to ensure business continuity is to focus on application availability. One way to get applications back up and running more quickly is through a traditional clustering model. However, clustering also has its challenges, requiring redundancy and nearly identical hardware and shared storage. It also often requires special versions of operating systems. Another issue is that there is often no application monitoring, and if the clustering is provided by the application vendor, it tends to be very application-specific. As a result, deployment has been very slow and meticulous.
An alternate way to ensure application availability is through server virtualization. With server virtualization, it’s important to apply it at the right level for your organization. There are four components of server virtualization, though each has its own weaknesses:
- Live server migration and storage migration. These can be automated, but they’re often not application-aware so they don’t sense if the application within the virtual machine has developed a problem. In many cases, the virtual console becomes a single point of failure, and it often requires shared storage, so a failure at the storage level can cause problems. It is also not host-aware so at this level you might not be aware of any host failure.
- High availability options. These are host-aware, and still automated, but not application-aware so you still don’t get visibility into the virtual machine itself. It also requires shared storage.
- Fault tolerance. Here, data is synchronized and automated, but still not application-aware. In some cases it doesn’t require shared storage, though in most cases it does. If there is a hard failure, there is some synchronization and you can start up the virtual machine in another place. But you still don’t have that fine-grained look into the application itself.
- Image-based recovery products. This new breed of product alleviates some of the complexity associated with traditional recovery because they protect all of the data on a system including the user data, application binaries and underlying operating system files. They are backed up and recovered as a unit to either the same or dissimilar hardware. This makes recovery easier and shortens your RTO, but these products are still not automated and lack application awareness that are crucial when you consider that a large majority of disasters aren’t of an environmental nature at all – they are application, operating or human error –related.
When thinking about DR for 2011, it’s important to consider application availability, and to do so in an application-aware environment. An application-aware environment can detect a frozen application, in which you lose your application but the operating system thinks everything is working fine. Application-aware systems can also detect degraded performance, which can come from a variety of different factors.
Another benefit of many application-aware systems is that they don’t require redundant hardware so you can mix standalone servers and virtual machines. It also eliminates the need for shared storage, so you’re protected from double-drive failures and more.
Application-aware availability has actually been around for quite some time. There’s been a misconception that virtualization removes the need for it, but the combination of the two can be quite powerful. One of the good things is that it provides a safety net as you start new projects. What this allows you to do is enable your second server to be a virtual machine and lets you keep your physical primary machine. This also removes the virtual console as the single point of failure and eliminates shared storage as single point of failure. This flexibility allows you to move and synchronize data to an agnostic device, giving you an “any to any” capability.
Conclusion
When thinking about your DR plans for 2011, make sure you understand what you are planning for, what your acceptable downtime is, what your needs are and what is involved in the DR process. Start small and plan for the events that will happen soonest and most often. While backup tools seem easy to implement, they might not be right for your environment. Focusing on application availability in an application-aware environment will keep your applications up and running, which ensures user productivity and customer satisfaction.
Bob Roudebush is the vice president of marketing at Neverfail (Austin, TX). www.neverfailgroup.com


