The Skinny on Backups and Data Recovery, Part 2

A cure for the "no tape drive, but I got a CD writer" blues.

This week, I will go out on a limb and make a gross generalization. If you're ready, here it is. An amazing number of Linux users and admins out there have CD re-recordable drives and no tape drive. You happily burn CDs to make collections of your favourite tunes, but have you thought about using it to create backups of your data?

A quick step back. Here's the scoop. There are ways to do backups right, but I want to start from worst-case scenario to best case. Meaning, soon we will be talking about tape drives. Some people out there might argue that "tape drives" sounds like an archaic way to deal with data storage. I'll answer that challenge next week. For now, I'm going to pretend you have a CD-recordable unit on your system, and show you how to use it for backups.

The question of whether a CD-RW is a good backup choice is sometimes settled in this way. You can afford either a tape drive (sometimes more expensive than the CD-RW) or the tape. When our machines get used for both business and pleasure, as is often the case with home offices, we tend to lean in the direction of "I want both".

Making collections of favorite songs and burning extra Slackware or Debian CDs is usually done with something called cdrecord. cdrecord is not specifically Linux software, and will compile and run on a number of different platforms. In case you don't already have cdrecord (maybe you just got a CD-RW last night), you can pick up the latest copy at this address:

     http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/cdrecord.html

The other thing you will need if you want to use your drive is a little package called mkisofs. Luckily, the latest version of cdrecord already includes mkisofs, but it is still available at this address:

     ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/

I'm going to work on the premise that the reason we are having this discussion is you already have a working CD writer setup. You got a good deal; the writer makes great song collections, and now you think using that same device for backups is a great idea.

The problem with doing backups using CD writers and re-writers is that they were never intended for that. A raw image, based on a mirror of the data you intend to capture, is written to disk. You then write that image back to your CD or CD-RW. Essentially, you need twice the amount of free space you are trying to back up. In the case of a full CD at something like 650 meg, you must have 1.3 gigabytes of space. It's not necessarily the friendliest way to do backups (and certainly not the space-friendliest). Luckily, there's a way to cut the necessary space in half and still get your backups done. With a sufficiently fast system, you can simply pass the iso9660 image data that is being created directly to the cdrecord command. That means we do not need to have double the space available, the backup tree and an ISO image to then burn onto the CD.

One way to do this is to beef up last week's identity backup script. If you remember, we created a temporary directory with a hierarchical "mirror" of our important data, then backed up that smaller mirror to a diskette. With that script, I gave you a very small list of files and suggested that your choice of what's "important" may be different from mine. After all, you can fit roughly 1.4 meg on a single diskette, and my example identity backup used only 37K.

With the CD-RW, we can increase that size to roughly 650 megabytes, which may be all you need for the things that change day to day. Remember the catch (or half-catch, now): you still need that spare 650 megabytes into which you can recreate the structure you want to back up. We don't get away that easy. If you plan on backing up only 300 meg of data, then you'll need the 300 meg of space. The reason it's good to remember last week's example is that it is a micro-example of what we are about to do here.

We start by creating our mirror. In this case, it is a directory called /mnt/data1/data_backup. On my system, /mnt/data1 is a separate drive with a fair amount of free space. It's where I keep images of Linux CDs, which I then push on my friends in an effort to get them off that other operating system. On your system, the mirror will most likely be in a different location. Just make sure the space is available. Here is that script.

#!/bin/bash
# script name : backup_to_cd
# This script does a backup of important files onto the CD-RW
# Marcel Gagne, 2000
#
# NOTE: my "data mirror" is /mnt/data1/data_backup
#
echo "Starting by Blanking the data_backup area"
rm -rf /mnt/data1/data_backup
echo "Recreating the data_backup mirror ..."
mkdir /mnt/data1/data_backup
mkdir /mnt/data1/data_backup/usr
mkdir /mnt/data1/data_backup/etc
#
echo "Backing up to data1 disk mirror area ..."
cd /
find home -print | cpio -pduvm /mnt/data1/data_backup
find root -print | cpio -pduvm /mnt/data1/data_backup
find usr/local -print | cpio -pduvm /mnt/data1/data_backup/usr
#
echo "Backing up system identity."
cd /etc
for ident_names in passwd group shadow profile bashrc sendmail.cw sendmail.cf
hosts hosts.allow hosts.deny named.conf named.boot aliases
do
cp -v $ident_names /mnt/data1/data_backup/etc
done
find nsdata -print | cpio -pduvm /mnt/data1/data_backup/etc
find sysconfig -print | cpio -pduvm /mnt/data1/data_backup/etc
find mgetty+sendfax -print | cpio -pduvm /mnt/data1/data_backup/etc
#
echo "All files saved. Ready to begin CD copy." 
echo "Shall I blank the CD first?"
read the_answer
#
cdrecord -blank=fast dev=3,0
#
echo "Shall I start the CD burn now?"
read the_answer
#
mkisofs -R /mnt/data1/data_backup | cdrecord -v dev=3,0 -

Notice that in my /mnt/data1/data-backup mirror, I am capturing /home, /usr/local and /root, none of which I was paying much attention to with my original identity_backup script. After creating our mirror, we immediately burn the data to our disk.

Yes? Ah. The reader in the back has a good point. I'm not really doing anything at those prompts for blanking and copying the CD (where it says read the_answer) other than pausing. Since the amount of data in my mirror can be pretty dynamic, not to mention downright huge, I want an opportunity to do a du -sk /mnt/data1/data_backup to verify that I'm staying within that 650 meg limit.

Back to the script. Since I am using a re-writable CD, I blank my CDs before starting. This is done with the line

   cdrecord -blank=fast dev=3,0

I use the -blank=fast option to quickly erase the table of contents from the disk. You have the option of blanking the entire disk, but that can take a long time.

The real magic happens at the end of the script with "mkisofs" and "cdrecord". The -R option on mkisofs means I want the Rock Ridge extensions to be used. In other words, a UNIX file system with user and group information, long filename support, etc. That's about it. One other thing, though: just as in our last example, this script is meant to be a jumping-off point for your own CD backup. What I consider important in my backup may differ wildly from yours.

For the curious, here is the normal chain of events in creating a CD. You would have mkisofs write out an ISO9660 image, which would then get recorded on the CD. The final backup onto the CD would then happen in two passes. For instance, the commands would be:

   # First we create the image based on a previously done "mirror" backup
   mkisofs -o /another_dir/image.iso -R /mnt/data1/data_backup  
   # Now, we record the image to CD
   cdrecord -v dev=3,0 /another_dir/image.iso

Incidentally, if you have a few ISO images hanging around on your disk (spare Linux CDs for your friends, etc.), you can mount those images and navigate them as you would any CD filesystem. Here's how. Pretend your debian.iso image is sitting out there on your disk, and you want to look at it. First, you create a mount point (mkdir /mnt/debdist). Next, using this command, you can mount the image:

   mount debian.iso -r -t iso9660 -o loop /mnt/debdist

The -r option means read-only. As you can see, it is very much like a CD filesystem.

Next time, I'll take up that challenge from earlier, and show you why tapes are still where it's at, and just how flexible those "old-fashioned" devices can be. On that note, we wrap up another week here at the corner. Until we chat again, remember that when you're down, only a good backup will get you back up.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Linux Disaster Recovery Software

Linux Disaster Recovery Software's picture

Thanks for the useful information. Linux Backup and Disaster Recovery is a major issue for system admins.

Re: The Skinny on Backups and Data Recovery, Part 2

Anonymous's picture

This is a great series on making good use of automating backups. I found that piping mkisofs into cdrecord caused errors though. I simply added an extra line to call cdrecord by itself and the problem is solved.

Re: The Skinny on Backups and Data Recovery, Part 2

Anonymous's picture

I cant get past the "dir/foo and dir2/foo have the same Rock Ridge name". I have read the man page 100 times because Jorg Shilling made a comment that this is a _documented_ bug, but I cant find the documentation anywhere.
Can anyone tell me what version of mkisofs and what options will let me backup a unix filesystem that *may* contain the same named file in different directories?
Please feel free to email me at
ispconsultant AT yahoo DOT com

Thanks

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState