Backup Strategy

Everyone tells you how important it is to make backups. Explicit guidelines, however, are often lacking. Which files should you back up, and how often? This article will help you answer those questions, and use the answers to develop your own backup strategy.
User Backups

ser backups are different from system backups, in that a user's files are liable to change frequently. It will almost certainly be impossible for you to have up-to-the-minute backups of a given user's file space, and you shouldn't even try. In backing up user files, you are offering your users a virtual safety net—reasonably recent copies of their files they can fall back on if they do something silly (like rm * bak instead of rm *.bak—it does happen!), or if the hard disk fails.

User backups will have to be done much more frequently than system backups, perhaps even daily (the cron program enables you to run programs at regular intervals, without having to issue the same commands each time—see the cron sidebar).

One useful feature of many backup programs (including tar) is the ability to backup only files which have changed after a certain date (the last time you did a backup, for example). This can drastically reduce the amount of work in a user backup, since a user is likely to be working on only a small number of files at a given time. You can combine full backups of your user space every so often with more frequent incremental backups.

While it is possible to use floppy disks for your backups, each disk can only hold a small amount of data. Many programs allow a backup to span several disks, but this means that you have to be there to change them while the backup is taking place. If you only have a small system with few users, then this might be feasible, but often it isn't. Magnetic or digital tapes are probably a better choice, simply because of their higher capacity. Linux supports a wide range of tape drives, either via the ftape module or its SCSI support (digital drives are almost always SCSI). The price of tape drives has fallen quite dramatically in the last 18 months or so, and they are now a realistic option for many of us. Alternatively, your Linux box might be on the same network as another machine with a tape drive. Linux can access tapes on remote machines, but that is beyond the scope of this article.

Whatever media you choose, you should look after it. Your backup is there for when things go wrong, so it is important that you can rely on it. You should always verify your backups; it is often said that an unverified backup is worse than no backup at all.

You should also keep more than one set of backups. A popular strategy is based on the “grandfather-father-son” idea. You have three sets of backups; the last one (the son), the one before that (the father), and the one before that (the grandfather). When you do your next backup, you copy over the grandfather, so the son becomes the father, the father becomes the grandfather, and the grandfather is replaced with a new son. The advantage of this strategy is that should one of the sets fail, you at least have something to fall back on, but you don't have to make more than one backup at a time.

The next piece of advice might sound strange at first: always keep at least one backup well away from your machine, preferably in a completely different building. Why? Well, what if the building burns down? You can replace the machine, and get a new Linux distribution, but you won't be able to replace your backup tapes. The data on your computer is its most valuable and irreplaceable component, so treat it with care.

How?

Okay, enough of the chat—let's see some examples. There are many different backup programs available, both freeware and commercial. Each has its merits, but for these examples, I'm going to use tar (GNU version 1.11.2).

Suppose you've just installed a lot of new software in /usr/local, and think it's time you updated your backup of the whole /usr/local tree. You don't have a tape drive, so you're using floppies. A command like:

$ tar -cWMf /dev/fd0 /usr/local

will do the trick. The c option means create an archive, W means attempt to verify the archive after writing, M tells tar to span more than one floppy if it needs to, and the f option tells tar where to write the archive, in this case to /dev/fd0—the floppy disk drive. On many systems, you will have to be root in order to access /dev/fd0 directly.

Even though I've requested verify, it doesn't hurt to check. The command:

$ tar -tMf /dev/fd0

will show a list of all the files backed up. Depending on the size of your /usr/local tree, you might need several floppies. You could reduce the number of disks needed by using tar's compression option; the z flag will tell tar to filter the archive through gzip, thus saving disk space. A good idea? Well, yes and no. While it is attractive to save disk (or tape) space, compressing a lot of files together is risky. It means that the slightest corruption is likely to destroy the whole backup, whereas if the archive is uncompressed, it might be possible to read past any errors, and retrieve at least some of your data. Some programs compress files individually before backing them up, and this is probably a better idea.

I mentioned earlier that it is possible to back up files that have been modified since a certain time. With tar, you can achieve this using the N option. For example,

$ tar -cf /dev/ftape -N yesterday /home

will backup all files under /home which have been altered since yesterday, this time to a floppy tape device /dev/ftape. An alternative approach would be to use a combination of find and tar:

$ find /home -cnewer /etc/last_backup \
  -type f i-print > back_these_up
$ tar -cf /dev/ftape -T back_these_up
$ touch /etc/last_backup

Here, the find command finds all files under /home which have had their contents altered since the file /etc/last_backup was last modified, and writes their names to a file called back_these_up. The T option tells the tar command to back up the files listed in back_these_up. Then we touch the file /etc/last_backup, so that the next time we do this sequence of commands, we get the files that have been modified since this backup. Combining several commands like this is quite useful; as a side effect, we have a list of files that have been backed up, as well as the time of the last backup (the timestamp of the file /etc/last_backup).

Another thing we could do is to filter the list of files, so that certain files don't get backed up. For example, you might not want to back up object files, or DVI files, since they can easily be recreated from the source code (which is usually a much smaller file!). A simple grep -v will do the trick if there is only one kind of file you want to ignore; egrep can be used to ignore several kinds of files. Change the first line above to something like:

$ find /home -cnewer /etc/last_backup \
  -type f i-print | egrep -v '<<<>.o$|<<<>.dvi$' \
  > back_these_up

to ignore object and dvi files. It's also possible to do the same kind of thing with find for this simple example, although it doesn't have egrep's powerful regular expressions:

$ find /home -cnewer /etc/last_backup \
  -type f ! \( -name \*.o -o -name \*.dvi \) \
  -print > back_these_up

It is likely that your exact backup requirements can't be met easily by a single tar command, so don't be afraid to write your own little scripts to do the job. They can be as simple as the three line example above, or as complicated as you like. A few simple scripts, run regularly using cron, can make backing up a very easy process.

Backing up needn't be a protracted form of torture. It needs to be done, and as a sysadmin you have to do it, but a bit of planning and clear thinking goes a long way. It is easy to feel that you must have a complete current snapshot of your entire hard disk at all times, and equally easy to believe that a six-month old copy of a few files lying about somewhere will do. The best strategy lies somewhere in between.

Malcolm Murphy (Malcolm.Murphy@bristol.ac.uk) remembers a time when 256K of memory was considered more than enough for all your computing needs, instead of the bare minimum cache requirement, and wonders if we aren't just a little spoiled nowadays.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState