Developing an Effective Data Protection Strategy

by Nordine Kherif

Suppose you arrive at work tomorrow morning and discover all the company data is gone. All your customer information, databases, billing, inventory and shipping records, project work, designs, prototypes, formulas--gone! How long could your company stay in business?

According to industry analysts, half of all businesses that lose their data go out of business shortly thereafter. Of those that do manage to stay alive, nine out of ten fail within two years.

It doesn't take much to wipe out a critical database of information. A hard disk crash, equipment failure or sudden loss of electricity can do it. Human error can lead to dropped computers or accidental erasure of data. Natural disasters abound from hurricanes to earthquakes, fires to floods. And, of course, there are malicious strikes from crackers, viruses and denial of service attacks.

Thus, the question isn't if you lose data, it's when. And although hardware, software and network equipment easily can be replaced, one of your most valuable assets--data--cannot.

The Need for Backup: Legislation and Privacy Laws

The last few years have witnessed a growing number of federal, state and local laws and regulations regarding data storage and privacy. Much of this has been driven by identity theft as well as the unintentional posting of sensitive data on the Internet.

Like it or not, it's your responsibility to keep pace with federal legislation, such as the Health Insurance Portability and Accountability Act (HIPAA) of 1996. Among other things, it created over 68 information security conditions for protecting the confidentiality, integrity and availability of individual health information. It also defines requirements for storing patient information before, during and after electronic transmission. The Gramm-Leach-Bliley Act (GLBA) of 1999 requires financial institutions to disclose the practices they have instituted to protect confidential information, while state laws such as California's SB1386 require all companies conducting business electronically in the state to report breaches of security that could compromise personal information.

Developing an Effective Data Protection Strategy

The time to create a backup strategy is before disaster strikes, so you can minimize data loss and business losses. Such a plan should be clear, specific and easy to follow and should incorporate several steps, starting with defining your needs. What you back up and how often you do it depends in part on the size of your company and the nature of your business. Consider these questions:

  • Are you backing up a single computer or thousands of computers?

  • Are your computers in a single location or in remote sites spread around the globe?

  • Does your data consist of a few megabytes weekly or hundreds of gigabytes daily?

  • Does your data repose on a single computer, or is it distributed among several servers?

  • Can your data be replaced easily, or is it so complicated that it's impossible to reconstruct?

  • Are you storing a few pages of text or thousands of megabyte-eating drawings/photos/videos?

  • Do you maintain normal business hours, or must your data be operationally available 24/7?

Once you answer those questions, you should prepare a needs assessment that addresses the following:

  • What information needs protection? What percentage still is active? How much is old or useless?

  • What is your recovery window. That is, how long can you survive without your data?

  • How much money will you lose for each hour your system is down?

  • Is your infrastructure set up for true backup and recovery?

  • Are your backup media reliable? Keep in mind that backing up files doesn't mean you are going to recover the original information. For example, many companies have sought to recover data from backup tapes only to discover the tapes are corrupt or failed to record, and data is lost.

  • Where are backup requests coming from: desk-bound workers? Remote locations? Mobile users?

Time Retention

Many industries have federal, state and/or local laws and regulations that stipulate how long you must archive certain data. Other industries, including medical and legal, have their own rules for document retention. It's your responsibility to keep up with these regulations and to understand the legal consequences if you don't comply. Consequences can include fines, sanctions or even orders to shut down your business. Such legal requirements are addressed by specific archiving solutions.

On the other hand, you also may have company-defined levels of importance for your data. This requires you to keep some information for longer specified periods of time, which generates additional costs.

What Is Your Backup Window?

How much time do you have to create backups? Must they be done hourly, daily or monthly? Remember, the more often you back up data, the greater the chances you can recover an exact copy of what you need. However, a cost is involved in the time it takes to back up data (which may impact daily business operations) as well as the storage costs for archiving it.

Costs

Costs may be overlooked when it comes to backup strategies. Such costs include hardware, software and, of course, storage itself. But don't forget the costs for someone to oversee and maintain the backup procedure. Also, keep track of licensing fees for software.

Creating a Data Protection Strategy

Basically, any data protection strategy contains three parts: backup, archive and recovery. Let's start by considering the backup portion of the strategy.

When preparing a backup plan, you must consider several important factors, starting with administration. Who is going to control your backups? Will it be a single administrator or a team with primary and secondary responsibilities for managing and maintaining backups on a regular basis? In all cases--even if your system is automated--someone must be responsible for verifying and maintaining the solution. Without this clear organization, your backups quickly may become unusable.

Important, too, are the management tools an administrator has. A good management system should tell administrators what data is and is not backed up, which data can be ignored, which data is accessed and how often. Also, the administrator often is responsible for updating and maintaining protective measures, such as anti-virus software and firewalls, as well as encryption schemes to make data harder to steal or corrupt.

Finally, establish a periodic system testing schedule. Many a company has needed to restore data only to discover the backup media is corrupted or blank.

What Backup Medium Is Best for You?

The most common systems for backing up data include tape, external disk drives and Zip cartridges; CDs and DVDs are used mainly for archiving. Newer solutions include jukeboxes filled with optical disks, network-attached hard disk storage, mirrored servers, Web-based storage and off-site storage in protected data centers. The choice you make depends on how failsafe your data backup must be, how much data you will back up and what you can afford.

Tapes are one of the most popular options for backing up data, because they are inexpensive and can hold massive amounts of data. Tapes also are easy to store off-site for security reasons. However, tapes can deteriorate over time. Also, to ensure the tape is recording properly, especially when using automated backups, you must perform regular restore simulations.

Selecting the proper tape solution depends in part on your defined backup window and the amount of data you want to protect. New high-end standards, such as LTO Ultrium and SDLT 320/600 drives, easily can handle terabytes of data.

Zip media are popular because they can hold several hundred megabytes of data. Unfortunately, they are expensive and can deteriorate over time. Portable hard drives that hold gigabytes of data can be a good choice if you want to back up a few local computers and then store the backups off-site.

Network-attached hard disk storage (NAS) consists of one or more servers attached to a regular local area network (LAN). Data is stored on the NAS servers rather than on the LAN itself.

CDs and DVDs are becoming popular because they are relatively inexpensive and hold large amounts of data, typically 700MB for CDs and 4.7GB for single-sided DVDs and 9.4GB for double-sided DVDs. Record-only versions are excellent for archiving, because the data can't be corrupted. Rewritable DVD formats can be confusing, however, because there are several different versions and not all versions operate on all players.

Jukeboxes consist of multiple numbers of CDs, DVDs or optical disks and can be used to hold terabytes of data. The entire jukebox acts like a single storage medium, making it easy to find and retrieve data quickly. Such media are more appropriate for archiving purposes and, as such, are complementary to backup solutions.

Web-based storage can be fast and economical. You must, however, have complete confidence in the service's abilities and longevity.

Data centers also are excellent for protected off-site storage. Usually they include restricted access, security cameras and both theft and fire protection.

The primary objective of a backup strategy is to secure your business continuity from any major disaster. You therefore may want to consider a combination of media. For example, you may decide to use disk-to-disk backups for fast restorations and use archiving to maintain a full data copy on tapes stored off-site.

What Software Is Required?

Software should cover all your needs, whether it's hot backup for applications or system support. You also should make sure that:

  • The media still can be used even if the software is no longer available or is no longer in use.

  • The software offers a solution for its own recovery in case of a crash.

  • The software allows you to generate fast transfer rates to stream high-end tape drives such as LTO and SDLT.

  • You can manage all your backups through one backup server in terms of disk space for the index and power for processing use.

  • New software includes backward compatibility to protect you against obsolescence.

Software specifications you should consider include:

  • backup speed

  • compression ratios

  • automatic restart and recovery

  • multiple simultaneous backup and restoration features

  • reliability

  • customization

  • scalability

  • flexible plug-ins for specialized tasks, for example, database backups

  • library sharing technology to cost-effectively consolidate backup devices

  • ability to connect several servers to a single tape library

What Platforms Must You Back Up?

Your backup strategy is simplified if your entire network runs on a single platform such as Linux. Running a heterogeneous network that includes Linux, Unix, Windows NT, NetWare and/or Macintosh can become a backup nightmare. One solution is providing centralized management that allows an administrator to oversee and manage the system from a single point of control.

Mobile workers and telecommuters have their own unique requirements. They require the ability to back up data from any device at any time, from any place in the world. In addition, they must have quick access to all stored data.

Scalability

If your company plans to grow, you must create a backup strategy that keeps pace. The first step is to be prepared beforehand. Consider backup storage/media two times bigger than you need today--that amount anticipates a 30% needs growth for the next three years.

You also may want to consider larger platform and applications support than what you currently need. Backing up an Oracle database that previously has been turned off may be difficult in the future because your business will require higher availability.

Test Your Backup Plan

Test your system to ensure that you have good backups. Once or twice a year, recover one or several servers and/or databases. You also should test your media on a regular basis to ensure the backed up data is readable and accessible. If everything works, you're safe.

Begin by establishing a corporate policy specifying what constitutes a critical system. This can range from a server to a database distributed over several servers worldwide. Next, determine what actions must be taken. It could be backing up data hourly, daily or weekly. Backups may be incremental, with full backups occurring at specific times, such as monthly, quarterly or yearly. Finally, review the process itself. In a worst-case scenario, you want to ensure that the most data you ever lose doesn't exceed an hour or, at most, a day.

Archiving

Archiving data is important for several reasons. It may be required by law, regulations or industry standards. You may have valuable customer databases and/or historical data that needs to be saved. Or, you may have extremely large files that are rarely accessed and take up too much space on your system.

Whatever the reasons, keep in mind the two primary considerations for determining how to archive your data: safety and accessibility. Small to medium-sized companies may archive data simply on tape or DVDs and store them off-site in a fireproof safe. Large corporations, however, may prefer to archive data in a secure data center. Both even may consider using on-line storage facilities, in which case it's important to know where the backup servers reside--whether it's in somebody's office or, perhaps, co-hosted in a secure data center.

Recovery

If you can't recover and restore your data, your backup plan is useless. Recovery should include various contingency plans. For example, if a single server fails on a network, a backup server automatically should assume its place. If the entire facility is lost due to a natural disaster, you may want to have hot and cold site backup facilities that are maintained by outside vendors for an ongoing fee. Usually, a hot site contains spare computers to which you can bring your backed-up data to resume operations within 24 hours. Cold sites don't have spare computers and can take up to a week to become operational.

On a much smaller scale, you might have a laptop computer in reserve in case your desktop hard disk crashes. In this scenario, you simply transfer your backed-up files to the laptop and continue working.

Currently, some companies that can't withstand even a minute of downtime are installing direct lines between their networks and their recovery sites. Data is backed up continuously and always is available for immediate recovery.

Software companies such as Arkeia have special software modules for disaster recovery based on using a bootable CD that automatically rebuilds a backed-up client or server from scratch.

Finally, test your recovery plans at least twice a year. Be sure to do this testing without advance warning, because you rarely have advance warning of when disaster strikes.

Creating and implementing a backup/archive/recovery plan is time consuming and easy to put off until tomorrow. But if you're a typical company, one of these days there will be no tomorrow for your data.

Developing a Backup Strategy: Sample Case Study

Marc J works for XYZ Company, which provides Internet hosting services on the east coast. He's in charge of infrastructure security for some 500 PCs, 20 HP Proliant Web servers for the internal network and a couple of Solaris servers running Oracle databases. In addition, the company has a few Windows application servers and users' desktops with a SnapAppliance NAS box.

Marc has been tasked with setting the backup infrastructure necessary to keep the company in business after full-site damage. The first step is defining what information must be secured. The core information is linked to the software environment, and the entire architecture is required to have a running system, including the company's Exchange mail server. After a few meetings and discussions, Marc determines the following requirements:

  • Protect the company's CRM.

  • Ensure customers using on-line remote services can access the company's Web servers at any time.

  • Secure customer data.

  • Get all desktops/work environments up and running in a day.

For each item Marc identifies data location, total amount, information type and the restoration window.

The company's CRM is based on a Web application front-end running on a Windows server and a back-end application with a built-on Oracle database located on a Solaris server. Less than 1GB is available for the Windows front-end on Windows, and 50GB is available for the Solaris/Oracle database. The maximum amount of time the system can be down for customer database access is 24 hours.

The Web servers are identical HP Linux servers operating an Apache Web server that connects to the remote 300GB MySQL database. This database is located on an HP Proliant DL 380 server running Red Hat Linux. Each machine contains less than 1GB of data on top of the system itself. Few of the machines are load balanced to handle connections peak charge. The Web server must be up and running in less than 24 hours, while the additional Web services including the MySQL database can be down for a day.

Over 500 Intel RAQ machines are in use, each holding up to 80GB of data. Each runs a SuSE Linux system with an Apache server and a MySQL database. Marc calculates the total amount of customer data to be about 1.5TB.

Twenty desktops are in use running Windows 2000 with Microsoft Office. They all share disk space on a SnapAppliance NAS 14000, along with a few Linux boxes for the technical team. The total amount of used disk space on the SnapAppliance is 400GB. The desktop has to be up quickly enough to handle customers' daily incoming requests. The NAS box can be down for more than 24 hours, because the customer database is located on the CRM system described above.

Next, Marc determines exact retention times for each type of data. Although e-mail must be archived for a long period of time, there are no other legal retention requirements. Therefore, Mark determines the following retention times:

CRM Application Data

  • 1 monthly copy with 1-year retention

  • 1 copy archived at the end of fiscal year

  • Weekly total with 1-month retention

  • Daily incremental

Company Web server

  • Weekly total with 1-month retention

  • Daily incremental

As stated in the customer contract, Marc's company guarantees to recover any damaged server and restore it to the condition is was in 24 hours prior. The customer also can request an annual backup policy to receive several versions of his environment. Approximately, 10% of XYZ's customers subscribe to this offer. Therefore, he figures the following:

For 90% of the Customers

  • Weekly total with 1-week retention date

  • Daily incremental

For the Remaining Customers

  • 1 monthly copy with 1-year retention

  • 1 copy at the end of fiscal year archived

  • Weekly total with 1-month retention

  • Daily incremental

Because XYZ's staff works from 8 am to 6 pm, it's been decided that customer data backups should run between 10 pm and 6 am to reduce the impact backups have on Web server availability.

Tape backup media is used to facilitate secure off-site storage. After some discussions, Marc decides that backing up 60MB of data can be handled easily by three 30MB/s LTO-2 tape drives (90MB/s total) or five 15MB/s LTO-1 tape drives (75MB/s total).

To complete his backup device choice, Marc is using a tape library large enough to manage the defined backup policy and the retention time. To keep a monthly backup cycle on-line in this robotic, Marc defines the on-line data for each segment.

CRM Application Data (total ~ 50GB)

  • 1 monthly copy with 1-year retention =3D> 0 GB

  • Weekly total with 1-month retention =3D>50 GB

  • Daily incremental =3D>50 x 0.2 x 5 =3D>50 GB

Additional on-line data requirements include about:

  • 150GB for CRM applications

  • 1.2 TB for company Web server

  • 3.15 TB for customers data

  • 1.2 TB for desktops and Windows servers

As a result, 5.7TB of on-line data is generated each month. Applying the normal compression ration of 1.5, this amount is reduced to 3.8TB of native data.

Therefore, Marc determines his company requires 40 LTO-1 tape drives (100GB each) or 20 LTO-2 tape drives (200GB each). His final choice is an IBM LTO Ultrinum Scalable Tape Library 3583 with 6 LTO-1 drives and 72 cartridges.

Nordine Kherif is Director of Support Services at Arkeia Corporation (www.arkeia.com).

Load Disqus comments