Home CTR Exclusives First Person: Email Recovery Under Exchange

First Person: Email Recovery Under Exchange

Email Backup

By Robert England

Exchange is much more than a simple email infrastructure; it is a vital part of every employee’s business day.  From simple messaging and workflow management to purchasing approvals and shipping schedule adjustments, Exchange is imperative to the operation of a business.  Having an Exchange environment that functions and is readily available are only basic aspects to consider.  Beyond these, are performance concerns, daily support and ongoing improvements.  One thing that can't be overlooked is a comprehensive backup and recovery strategy.  It is vital that IT organizations have a holistic view of Exchange as any unplanned disruption in service, due to network issues, software or hardware can be very costly.

At LSI, we were fortunate enough to be able to upgrade to Exchange 2007 as part of the company's merger with Agere Systems in April 2007.  The merger afforded us a unique perspective into what both companies were doing right and what could be improved upon.  There were varying backup systems being utilized and hardware standards that had to be overcome.  As IT teams and processes became streamlined, the Client Server Integration team took on this challenge.

Building the Recovery Plan

When building out an Exchange recovery plan for a global company, the high level system architecture plays an important role.  The servers need to be located in data centers that have both the space and the network bandwidth to support Exchange backups.  This can be a delicate balancing act between user performance and disaster recovery.  We worked with the Storage team to determine which locations could handle the backups and growth.  We also worked with the WAN team to determine which locations are centralized enough to support Exchange.  We decided that four specific sites across the world would host Exchange mailboxes regionally.  We felt this would afford us both enough geographic diversity to overcome a major location-related outage while also meeting the performance requirements of our employee’s.
During the design phase, IT took an extensive look at what kind of Help Desk tickets were being logged.  Due to the high volume of mailbox size increase requests, we decided to upgrade to the latest LSI Engenio® storage systems at that time.  This resulted in two key benefits – an increase of each user’s mailbox size from 250MB to 550MB, and a significant increase in the performance of our database.

Since we were anticipating additional mailbox growth due to new hires and the increase in overall mailbox size, manageable data store sizes was our next challenge to address.  This not only plays an important role in overall performance, but also in how quickly Exchange could recover from a data store outage.  With Exchange 2003, we became complacent with one storage group and five mailbox stores per.  This resulted in large database sizes that would take too long to restore if just one store had an issue.  With Exchange 2007, we decided on having 100 users per data store. This gave us a maximum of 50GB mailbox stores if all mailboxes were maxed out to their threshold.  This size would allow us to shorten our service level agreements (SLAs) for data store recovery.

Scaling

As our company continues to grow, the amount of email also grows and increases exponentially.  Beyond the growth of simple email volume, the quantity and size of file attachments also continues to grow, which effectively reduces the mailbox space available for plain email.  Providing additional mailbox space for the users is more than just adding disk drives.  This is not restricted to Exchange or email data, this stretches far beyond email and really encompasses all unstructured data. Disk drives are relatively cheap and most companies see adding disk this as the quick and easy solution; however, when adding additional space to an Exchange environment, or any file based environment, the costs go much deeper than the price for disk. IT administrators must also consider what capacity is available to back up that additional data being stored.  Hard and soft costs also start to climb in equipment and time, respectively. With this increase in data being stored, there is a good probability that backups will overrun available backup windows impacting other system being backed up and affecting their SLA’s, and if that happens the cost could be much higher. 
We had to consider not only the size of each disk, but the LUN layout, the number of drives and the RAID type we used.  After balancing the cost out, we decided to have a simple mirror for the logs with a RAID 5 back end for the data stores.  Despite these existing safeguards, each storage array also has a standby spare for an extra layer of protection. When planning the LUN layout, it’s important to design and build for system uptime as well as the ability to recover from a hardware failure based on the corporation's SLA.
After the migration to Exchange 2007, we performed periodic database restores to determine our average restore times.  We found our average backup times for 15 stores at 35-40GB, was around 2.5 hours.  From there, a single restore of our largest database (at 50GB) took only 34 minutes.  This was acceptable for a single store failure.

Fear of Failure

Today, our biggest challenge is what to do in the event of a true server failure.  We retain a service contract on each Exchange server’s hardware that is under our SLA time.  We included both the time to rebuild the server from scratch as well as the time for installing Exchange again in disaster recovery mode in order to get the service back online under our SLA. We considered deploying a cluster or standby server, which would drive down our recovery time, but this would also depend on the nature of the outage.  We examined these options and determined that it was an unnecessary expense due to the quickness of recovery, even in the event of a hardware failure and rebuild.

To be on top of Exchange means keeping a sharp eye on the backups.  Ensuring that backups are functioning and backing up everything is just as important to supporting the Exchange environment.  In fact, it could be even more important.  If backup failures go unnoticed and there is a failure, not being able to recover from a failure could be detrimental to a company’s operation.  Besides looking at daily backup reports, a supplemental way we monitor our backups and their successes is to run a powershell script:

  • get-mailboxdatabase  –status |  sort-object lastfullbackup | format-table identity, backupinprogress,lastfullbackup

This command can be automated or run manually and shows you the true state of the Exchange backup.

There are a number of other options available in the event you have a failure within your Exchange environment.  Since Exchange 2007 allows empty mailbox stores and the fact that all user email configuration data is stored within Active Directory, bringing a new server online or bringing up a database on a different server are a few of the more common methods to use in the recovery of service and data.  This also buys any IT organization time to initiate mail store recovery.

Today, the need to restore databases has all but been eliminated for almost everything except in a serious failure.  In the past, email databases would need to be restored if a user deleted a single email from their mailbox that they eventually realized that they needed.  Since Exchange 2007 has Deleted Item Retention, we are able to recover just about anything accidentally removed.  Even users who do a “shift – delete”, their email can be recovered with a Registry Key “LINK TO THE MS DOCUMENT”.

PST Sprawl

Since not all email resides on the Exchange servers, personal folders, or PST files, provide yet another recovery challenge.  As email volume increases along with file attachments, both the number and the size of PST files go up.  In order to ensure that these files are recoverable as well, we have implemented a client based PC Backup program that is specifically designed for PST file backup.  This program is more efficient than backing up PSTs using server based backup.  Using a PC Backup program puts the management of PSTs in the users’ hands.  Our resulting policy is that PST files are not to be stored on file servers.  Primarily, this is done because PST files can really put pressure on server storage and those backups as well.  LSI is currently are evaluating Exchange 2010 and its built-in archiving solution as a way to help reduce PST sprawl.

One last thing to note is that we perform annual health checks on our Exchange environment. This has helped us tremendously in being proactive and providing the needed uptime. This health check is performed by Microsoft as part of our premier support contract and has been what should be considered as the single most important part of checking the health of any Exchange environment. Microsoft has a multitude of checks and best practices that we run against our environment. We take the results very serious and take whatever steps are necessary to ensure the health of our Exchange infrastructure.

In closing, backup challenges exist in many areas; these challenges exist in environments like Exchange as well as other types of data storage systems (e.g. SQL servers, file servers and other application servers). When data grows so do the challenges. Managing our backups to our SLA is the key to meeting the needs of the business. If the growth of data affects our ability to meet these SLA’s than something needs to happen. We can either grow the backup environment or better manage the data; however, we've found that doing a combination of the two tends to deliver the best results.

Robert England is Senior Manager, Client Server Integration,  at LSI Corporation

 

Computer Technology News
  See current issue or subscribe below

Subscribe to CTN