Home Virtualization Combining virtualization and fault tolerance makes too much of a good thing wonderful
Tuesday January 06, 2009

Combining virtualization and fault tolerance makes too much of a good thing wonderful

Actress Mae West once quipped that “Too much of a good thing can be wonderful.” That is a great laugh line, but it’s seldom true in practice. Excess usually costs in the end.

Whatever the racy comedienne was thinking when she coined that bon mot, it most assuredly wasn’t virtualization. But it applies perfectly to this hot trend in corporate technology. Virtualization is a good thing so long as corporate IT managers know when to say enough. In this case, “enough” is knowing when consolidation compromises security and reliability.

Though an undeniably positive technology development, virtualization re-introduces a potential problem that CIOs have fought for years—the single point of failure. That seems to contradict one of virtualization’s inherent benefits, which is the ease of setting up multiple instances of application servers throughout the server environment. With backup copies ready to take over if a virtual server crashes, the virtualized system essentially backs itself up.

That begs two questions, however. First, at what point does having many virtual application servers as backups erode the benefits of server consolidation through virtualization? And second, what happens when the physical servers themselves crash? In a conventional IT environment, a single server crash usually doesn’t affect the company’s operations. It might take down a critical application and possibly affect other operations integrated with it. However, in a virtualized environment each physical server might support five or 10 critical applications. The customer database going down is a problem. The whole server going down and taking the customer database, e-mail system, file servers, e-commerce system and accounting applications with it is a disaster.

In this context, there can be such a thing as too much virtualization if the hardware doesn’t keep pace with the software. Virtualization strategies must include availability strategies that encompass the full range of availability technologies, from clustering to fault-tolerant servers. CIOs and data center managers who think that virtualization is a de-facto availability strategy don’t really have an availability strategy. They have too much of a good thing, with a bad thing waiting just around the corner. This doesn’t detract from virtualization’s benefits. It just requires a more analytical look at availability in virtualized environments, and a recognition of virtualization’s realities.

The right solution at the right time

It’s easy to understand the temptation to go overboard virtualizing data center servers with little regard to availability issues. Virtualization is the technology of the moment, and unlike many technologies of the moment, it lives up to its hype. With data center sprawl reaching crisis proportions, server virtualization reduces data center footprints, overhead costs and power consumption—bottom-line economics and green economics in one package.

Corporate IT managers have aggressively embraced virtualization to rein in data center sprawl. Almost 80 percent of companies surveyed by the Yankee Group are using some kind of server virtualization technology, and 85 percent of the money spent on virtualization software went to server consolidation projects. Implementing several virtual servers on a single physical server raises utilization rates from an average of 10 percent per box to as much as 80 percent, according to the Yankee Group’s 2007 report on server virtualization. Virtualization’s entrance didn’t come a moment too soon, for image-conscious companies, either. Quoting a McKinsey & Co. study, a May 1 New York Times article predicted that at their current pace, data centers will be the biggest producers of greenhouse gasses by 2020.

On the management side, virtualization sharply reduces provisioning costs. IT managers use templates to set up virtual servers and shift them among existing physical servers according to available processing capacity. That enables IT to balance workloads and improve efficiency without configuring and implementing new servers for each new application. This ease in provisioning resources is what leads to the perception of virtualization as a high reliability solution. Setting up virtual servers on different machines is a partial safeguard against hardware malfunctions and avoiding planned downtime for upgrades, patches and such.

However, it does not guarantee continuous uptime for applications that must run 24x7x365. At the end of the day, the best virtualization scheme depends on processors, disc drives, power sources, interfaces, etc. clicking away reliably and without interruption. It worked on mainframes because of their layers upon layers of redundant, backup and recovery systems. The typical x86 data center server doesn’t have even an infinitesimal fraction of the mainframe’s reliability. If a physical server supporting 10 applications has a disc drive or power source failure, processing stops. Even if there are other copies of the applications ready to go on other servers, the processing chain is broken, in-stream data and transactions are lost, and the backup database might not be completely up to date. In this scenario a virtualized data center could recover from a failure more quickly than a conventional data center, but with only slightly less loss.

“Common wisdom says not to put all your eggs in one basket for fear of something happening to the basket,” said the Yankee Group’s 2007 report Server Virtualization Creates New Opportunities for Fault-Tolerant Servers. “Companies are now consolidating their infrastructure using virtualization technology. However, the fear is that by reducing hardware and running many critical applications on fewer machines, single points of failure may be reintroduced,” it continued.

For non-critical resources, high-availability solutions such as clustering and the growing array of reliability features built into virtualization software provide sufficient uptime and recovery. Critical applications and data, however, need continuous availability solutions, such as fault-tolerant servers. Fault-tolerant servers are essentially two servers in one, running in lockstep, each able to carry the processing load with no break in processing in case of failure. When huge sums of money, and even lives, depend on applications running continuously, there is no substitute for fault-tolerant technology.

Multi-tiered reliability strategies

Kansas City Terminal (KCT) Railway is among the many companies that wanted virtualization’s efficiency and economy for a major new software implementation. Unlike many companies, however, KCT was also wary of hardware failures and the resulting damage. KCT is the second-largest U.S. rail hub, running 350 trains daily over its 85 miles of strategically located tracks in Kansas and Missouri. A single switching error causes lost revenue, millions of dollars per hour in extra operational expenses, and schedule delays lasting weeks all over the country.

When KCT rolled out its virtualized ECIS (Enterprise Control and Information System), it was on a fault-tolerant hardware server that reduced the chances of unplanned downtime to a few seconds in a year—at the most. KCT’s IT management company, Railware, created a virtual Windows 2003 server on the fault- tolerant server for application logic, SQL server, and front-end processor functions. The virtualized environment replaced a more server-intensive architecture in which each ran on its own Windows 2003 machine. Switching to a virtualized infrastructure reduced operational overhead, simplified KCT’s IT infrastructure and eliminated several points of failure from the KCT network. KCT executives would not, however, approve the virtualization project without assurances from Railware that the system was immune from unscheduled downtime, according to Railware President Ross Pirtle.

“We feel confident doing this for Kansas City Terminal Railway because of the extremely high levels of reliability provided by the fault-tolerant hardware and the virtualization software from VMware. We simply would not have created a virtualized system otherwise,” Pirtle said. KCT has had zero unscheduled downtime since it implemented the virtualized system.

KCT is an example of a company realizing the cost, efficiency and environmental benefits of virtualization by implementing the virtual environment on the hardware platform best suited for the task. In doing so they prove Mae West right—too much of a good thing really can be wonderful.

Denny Lane is director of product marketing and management at Stratus Technologies.