By Jim Reinert
We’re turning into a nation of hoarders. Email hoarders that is. Corporate governance regulations combined with an all-pervasive “cover your back” mentality means that we save more of our email correspondence than ever before.
This in itself is not a bad thing. Recent high profile cases of “lost” emails have sent shockwaves through the corporate world.
Nevertheless, the explosion in email usage (35 million emails are sent worldwide every day according to IDC) can only add to the workload of IT teams. While their colleagues are busy creating new emails, IT staff is working to ensure that the thousands already in existence are safely stored. Because of the high costs associated with storing email directly on the server, most companies now set a limit on mailbox sizes meaning that emails quickly pass from the server to tape.
It is not until somebody needs to retrieve an email that these email archives are put in the spotlight. Suddenly the managing director wants proof that a supplier contract was terminated six months ago and there’s no time to spare.
Easier said than done. Creating a recovery server, mounting entire mailboxes and searching through each one for the email in question can take hours as well as drain thousands of dollars from the IT budget. And all the while there is the pressure of responding to the urgent query: can this be done faster?
The answer is yes – with some clever technology in place.
What is stored? What needs retrieving?
Before we discuss technology, it’s important to look at what email archives contain, and what is driving requests for retrieval.
The main point to be made is that there is no predicting what might need to be dug out of an archive. The very appearance of email applications testifies to a mail hierarchy through the use of priority flags, such as “important” in the subject line and folders entitled “do not delete”. We recognize that some emails are more critical than others. Yet anyone who has had to analyze an email chain in order to explain an error, oversight or misunderstanding will tell you that no email can be dismissed as irrelevant.
In a legal case there is a very real possibility that the judge will ask to see every piece of correspondence between two parties.
There is also the matter of attachments. In the thrall of Google, we increasingly neglect filing in the faith that we will be able to find information using powerful search mechanisms. This extends to email – instead of storing documents on the server we save them as attachments. For this reason, many retrieval requests may not be accompanied by a specific subject line.
In short, our own working habits and an increasingly regulated environment mean that retrieval cannot be limited to emails of obvious exterior importance. The quick email fired off by an intern cannot be dismissed as less critical than the one sent to the finance department entitled VITAL.
Traditional methods of retrieval
The vast majority of businesses use the Microsoft Exchange email server (Microsoft Exchange Server hosts 126 million in-sourced corporate mailboxes worldwide in 2005 according to a report from Radicati Group). This makes perfect sense given the popularity of the Outlook application. There are very few drawbacks to this – however, it remains tortuously difficult to restore messages, mailboxes and other data. While we would like email recovery to be as easy for IT staff as Outlook makes using email for employees, that is simply not the case.
Exchange administrators can either run a full backup or back up an individual mailbox or individual mailboxes (“brick-level” backup). While the first form of back up answers the requirements of today’s business continuity plans, it creates problems when a request comes through for the recovery of a single mailbox or individual mail. The only option is to spend a great deal of money and time resurrecting the entire group of mailboxes. The challenge only deepens when it comes to searching for an individual message.
This is not impossible. By creating a duplicate Exchange server (“recovery server”), it is possible to copy the backup to this “clone” and then export individual mailboxes to .PST files. These can then be searched for the requested messages, which must then be copied back to the server.
In short, it’s not impossible to locate the needle in the haystack but it’s certainly not a cheap or easy option. On average a recovery server costs between $10,000 -$15,000. True, it is possible to build a recovery server only when you need it, but that could take an entire day, which is generally impractical in a corporate environment where information is needed quickly.
Some experts say that the answer is to perform a brick-level backup. This method enables the restoration of single mailboxes, groups of mailboxes, single messages or groups of messages. On paper this sounds great. However, the “too good to be true” adage comes into play. Brick-level backups take significantly more space and time than full backups. A server with 400 mailboxes can take about one hour to do a full online backup. The same server, doing a brick level backup of all of those mailboxes, one at a time, can take 18 hours.
To conclude, it’s a question of finding the lesser of two evils: a full back-up followed by recovery server creation costing thousands of pounds or a brick-level back up costing hours of precious time.
The good news
What exchange administrators need is a means of restoring individual messages, mailboxes and attachments from a previous full backup – the best of both worlds. Both Microsoft and Symantec have acknowledged that this is the “killer function” of a continuous backup product and have publicly goaded each other to be the first to develop this application.
The importance of being able to run full backups (instead of brick-level ones) cannot be underestimated. Not only is it an important business continuity measure (in a crisis you may lose all your email files), it is much less space-consuming and thus keeps storage costs to a minimum.
However, the real key with this technology is to make it as search-centric as possible. Accustomed as we are to skimming through our files on Outlook, and searching by date, sender, subject etc., it is natural that we expect our IT teams to be able to scan archives in the same way.
Ideally, this software should also work with your current backup procedures, so that starting to use it does not mean changing the existing business continuity strategy. It should also be possible to simply drag and drop the messages you wish to restore to the target location (usually a folder on the Exchange server).
There are mailbox recovery software products such as Ontrack Data Recovery’s PowerControls that often pay for themselves in just one use. Mailbox recovery software was developed precisely to end the choice between the expense of recovery from a full back-up and the drain on resources produced by brick-level methods. It’s a tool that has been welcomed with open arms by highly-regulated industries tasked with increasing the speed of document retrieval. Legislation including the Freedom of Information Act (2005) has put even greater pressure on organizations to speed up the process of responding to requests.
In this instance, it is important to return to the factors driving our requests for retrieval. The fact is that email restoration is no longer just another scenario covered in the disaster recovery plan. A combination of growing email use and smaller mailbox size limits means that requests for recovery are only likely to increase. Legislation related to data retention and transparency means that external pressures will continue to create work for the IT department.
In short, we need to see email not as a transitory item that gets “pinged” around the world, but as a record of our actions that lives on even when long forgotten by its creator. There’s nothing wrong with relying on it in this way, we just need to think carefully about how we take care of it. Email longevity is not guaranteed just by storing it, but by ensuring we can access it as easily in an archive as in an inbox.
Search is King according to the Google-enamoured stock market. Let’s make sure email enjoys its reign.
Jim Reinert is senior director of software and services, Ontrack Data Recovery (Eden Prairie, MN).
While talk of reduced recovery time, an end to brick-level backup and killer functions may be music to the ears of the IT department, communicating the benefits of this technology to the wider business can be a challenge.
www.ontrack.com