The Skinny on Backups and Data Recover, Part 3
Welcome back to another session of fun and sun in the Sysadmin's Corner, where every day is a holiday because nothing beats the fun of administering your own Linux system. Today, we continue with our foray into the world of backups.
Your Linux system actually has a number of options for doing backups. The most popular commands are cpio and tar. Other command-line tools such as dd, dump and afio are available (although afio might not be included in your distribution). tar and cpio are still quite useful and powerful, and are the ones I usually use myself.
The main advantage of cpio over tar is that it does somewhat better packing of data on your backup medium and it handles errors better as well, particularly when dealing with tape. There is one other advantage. When using tar, you tend to work in terms of short lists of directories or files. With cpio, you can largely customize the files that wind up in an archive. For instance, you can work from a list of files and pipe that list directly to cpio:
cpio -ov > /dev/st0 < /tmp/list_of_files
The -o option means to write out an archive directed to /dev/st0 and take the list from a file called list_of_files. I could then go back and check the backup by reading the tape and checking the table of contents. This is done with the -t option:
cpio -ivt < /dev/st0
To extract a file from that archive, I need to know how it was stored (the path to the archive). The command is fairly simple. Let's say I want to restore a file called "lost_file". I would do it like this:
cpio -iv lost_file < /dev/st0
I also want to cover tar. I confess that I probably use tar more than any other command. It's partly out of habit, since I have been tarring files for a number of years, and it's also partly because single-file archives are always delivered tarred and in some way compressed (as in cool_new_software.tar.gz).
Linux's tar isn't your plain old tar; it's GNU tar. This means you have some advantages, making it that much more interesting. For instance, GNU tar makes it possible to do compression on the fly (with both standard compression and gzip compression). You can also specify multi-volume backups. Here are some examples.
tar -cvf /dev/fd0 /mydata
With this, I am backing up the directory /mydata to my floppy. Every once in a while, nothing beats the convenience of using diskettes. Unfortunately, as I mentioned earlier on, there's only so much space on a diskette. What if my floppy is really too small for that amount of data? No problem (assuming I don't need a HUGE number of floppies). Just use the -M flag, and you will be prompted for the next volume in your multi-volume backup. Like this:
tar -cvMf /dev/fd0 /mydata
To compress data (you can't use compression on multivolumes; sorry), you can use the -z flag. For instance, if I want to archive /mydata and gzip it to my tape drive on the fly, I would use this command:
tar -czvf /dev/st0 /mydata
For good old-fashioned zcat type of compression (as with the compress command), use a capital Z flag instead of the lowercase z.
On the question of medium, here's how I see things. Floppies are extremely convenient for storing small collections of files (like that identity backup from a couple of weeks ago), but their capacity is very limited. CD-RW is reliable, access is reasonably quick, but the capacity (while much larger than diskette) is still limited. A 2GB Jaz drive is quite hot, but 2 gigabytes is pretty much it at the moment, and spare cartridges can be pricey. A 2GB cartridge sells for roughly $150 in my neighborhood. It's good for infrequent backups and data that doesn't change much, but there are limits. You could use a spare hard disk, but multiple archives of your data are a bit difficult with that scenario. It's very fast, though. Then, there's tape.
Tapes themselves are relatively inexpensive (a 12GB DDS tape costs about $25 around here). The number one advantage to tape, however, is capacity. There is no other medium, short of maybe another disk, that provides the backup capacity of today's tape drives; certainly not at a comparable cost. In terms of capacity, it is now possible to get Linux-compatible tape drives that will back up an astounding 50GB on a single tape.
Large capacity makes for another great advantage: unattended backups. You can pop in a tape before you head out for the night, rather than sitting around watching files list to the screen. With a little ingenuity, you can verify a backup, capture a list of what has been backed up and have the result mailed to you in the morning. For instance, take a look at this hastily constructed script which (I am quite sure) could be a lot prettier.
#!/bin/bash # # 4mm.dataonly - This Short Backup Script backs up only my data # 2000 - Marcel Gagne # # Set up some file pointers for short backup sb_log=/usr/local/.Admin/dataonly.log sb_errlog=/usr/local/.Admin/dataonly.err # Do we capture the file list, or send it to dev null? # file_log=/usr/local/.Admin/backup.log file_log=/dev/null admin_dir=/usr/local/.Admin # Do a little cleanup. mv $sb_log $sb_log.old mv $sb_errlog $sb_errlog.old # Prepare report headers # echo "==============================================" > $sb_errlog echo "What follows is a report of errors encountered" >> $sb_errlog echo "during the backup or its subsequent verify." >> $sb_errlog echo "==============================================" >> $sb_errlog echo "Data Only Nightly Backup. <`date`>" >> $sb_log echo "============================================================" >> $sb_log #Get on with actual backup # echo "** Moving to data directory..." >> $sb_log cd /root echo "***Nightly Backup Starting : `date`..." >> $sb_log echo "Backup errors ..." >>$sb_errlog tar -cvf /dev/st0 . 2>>$sb_errlog # Verify Backup # start by rewinding the tape mt -f /dev/st0 rewind echo "****Verifying the Backup : `date` *** " >> $sb_log echo "Restore and verify errors . . ." >>$sb_errlog tar -vtf /dev/st0 2>>$sb_errlog echo "*****Nightly Backup Completed : `date`..." >> $sb_log # Report on this, will you? cat $sb_errlog >> $sb_log mail -s "Dataonly backup status report" root < $sb_log
When I come in the next morning, my backup has completed and I have an e-mail message telling me when it started and how long it took. If the tape generated messages to STDERR (standard error), I'll see it in that message.
Another thing I toss in there is an option to list the files on backup and restore. WARNING! This can chew up a lot of disk space, which is why my file_log has the option of going to either a file or to /dev/null. Since my 4mm DAT can back up 4 to 8 gigabytes, I can have this happen every night with a cron job and not worry about it. All I have to do is remember to put the tape in. Here's a cron entry for a backup that runs at 11 pm every night, Monday to Friday.
0 23 * * 1-5 /usr/local/.Admin/4mm.dataonly
While this may not seem like the greatest argument, tools are another point I will bring up in support of tapes. Since tapes are (for better or worse) the medium which the data protection world has grown up with, the vast majority of backup tools are designed to work with tape. These range from free to very expensive. One of the tools I've taken a liking to is something called taper (despite the fact that it is limited to 4GB per archive). While taper will work with things like a floppy drive or a disk file, it is really a tape tool. taper also has a nice ncurses screen.
Taper (written by Yusuf Nagree) can be found by surfing over to http://www.e-survey.net.au/taper/.
The installation is pretty simple. All I did was download the latest version (taper-6.9b.tar.gz) and follow a few steps. The page also contains a warning about making sure you have ncurses version 4.1 or better.
tar -xzvf taper-6.9b.tar.gz cd taper-6.9b make make install
You then start taper with a command switch to define what medium you want to back up to. Yes, despite my ramblings about tape, I am mentioning a tool that is quite at home with a file on disk, a floppy drive, a Zip drive, or (you guessed it) a number of different tape drives.
You have options other than tape with this little program. Using command switches, you can define your destination. Here's a little sampling:
taper -T s # starts taper with a SCSI tape drive taper -T r # starts taper with a floppy
The -r option can be further modified to use things like Zip drives, although I personally have not tried it.
The interface is simple and menu-driven. To back up your data, choose the "Backup Module". The software will identify your tape at the beginning, and ask you for an archive and volume title. For instance, I used "Web server archive" and "Volume 1" as my storage information. You are then presented with a list of directories, relative to where you started taper. You select files for inclusion in your backup with single keystrokes. i means to include the file (or directory) and u means to "un-include", if you change your mind. When you are done selecting, simply press f for "finish", and taper goes to work with a running report of where it is and how long it expects the whole process to take and how far it has gone.
In order to restore, choose the Restore Module. You will be presented with a list of your backups and archives. Choose from the list, and press <return>. You'll be prompted for the directory where you wish to restore the archive (you might not want your files in precisely the same directory as you started). Choose the files you want to restore (in the same way as above), and press f when you're done.
Whoa! I've gone on way too long today. When next we reconvene at the Corner, I'll wrap up the current foray into backups with a tour of some of the other options out there for making backups easier, including some suggestions for Windows machines on your network. No! Nothing like that! Until we chat again, remember that when you're down, only a good backup will get you back up.
- Let's Go to Mars with Martian Lander
- My Childhood in a Cigar Box
- Applied Expert Systems, Inc.'s CleverView for TCP/IP on Linux
- Papa's Got a Brand New NAS
- Returning Values from Bash Functions
- Tech Tip: Really Simple HTTP Server with Python
- Rogue Wave Software's TotalView for HPC and CodeDynamics
- Panther MPC, Inc.'s Panther Alpha
- Debugging Democracy