Tape Machines

by Jim Hatridge

Tape drives are slow compared to disk drives, but can back up large quantities of data. Today, it's normal to have a 20GB hard drive on a home computer, and hard drives of 160GB or more can be found on our PCs. If you tried to back up such a 20GB hard drive on the common 1.44MB diskettes, you would need about 14,223 diskettes--not very practical. A tape drive, on the other hand, can store up to 40GB for about $15 per tape. The 14,223 diskettes would cost over $2,100 for the cheapest kind. <o>Tape machines are comparatively slow, therefore, I start my backups at bedtime, so they have them done the next morning. In fact, by leaving a tape in the drive, you can have cron do the job automagically whenever you want.

Some say that tapes currently are useless because the price of hard drives has dropped so much that it's as cheap to buy another hard drive for backup. For somebody who runs his own ISP, this might be true. But for someone with only a single system, or a small home network with only a couple users, I think this is wrong. An 80GB tape costs about the same as an 80GB hard drive. But to backup an 80GB hard drive on a home network, you don't need an 80GB tape. When you stop and look at your hard drive, at least 50% of the data is system programming that is already stored on your distribution's CDs. This data does not need to be backed up. The hard drive of my home network's main system has 6.5GB of space. It is about 70% full, which seems to be normal for a home system. Of this 4.5GB of data, at least 3.2GB are system and program files that came with my distribution or are from other CDs I have. Therefore, I have only about 1.3GB of data to worry about. These data files are, for example, e-mails, MP3s, configuration files, etc. Even on very old 512MB tapes, it would only take two or three tapes to copy everything necessary.

So far, I've recycled four tape drives, all of them SCSI types. I decided to set up one of the SCSI tapes on a node of my home network to learn how to back up data on my machines. On Penguin, Foe of Batman (I name my systems after famous penguins), I added an old SCSI 1510 controller card with a well-used QIC-1000 tape drive and a small 200MB hard drive (I added the hard drive only because I had it--I'm a pack rat at heart). The tape drive came with ten 525MB tapes. I recycled all this from about four different sources.

Hardware

To set up internal SCSI hardware, three things are needed: the device, a cable and the SCSI card. Most SCSI cards can run up to seven devices, normally hard drives, CD-ROM drives, tape drives, scanners or Zip drives (SCSI Zip drives are not common). Each device has an ID number, for example, my tape drive is number 3, and the hard drive is 0. The numbers 0 through 6 are available for use. On internal SCSI devices, you will find jumpers to set the number. You need to check your device's manual for the correct jumper. On external devices (for example, scanners), you will find small dials with 0 to 6 on them. Simply dial in the correct number.

You can choose any number, but normally a tape drive is number 3. In fact, the factory often will pre-set the number that is common for a particular device. So, when you have a new device without the manual and cannot figure out how to set the jumpers, just plug it in. Unless you have two (or more) of the same device, it should work without a problem. I don't know why, but not one of my four recycled tape drives has the jumpers marked. SCSI has eight ID numbers: 0 to 7. Zero is reserved for the hard drive; 1 through 4 are open; 5 is reserved for a CD-ROM drive; 6 is open; and 7 is exclusively for the SCSI card.

The most common SCSI setup problem is getting the termination wrong. Terminators are electrical "traps" that keep signals from being reflected back along the SCSI cable. In order to work correctly, there needs to be one terminator at each end of the cable, and no terminators in between. SCSI cards typically have a terminator built in, so you can either turn on termination on the tape drive (if it is the end device) or add a terminator after it on the SCSI cable.

Software

Using the UNIX idea of several tools for getting a job done, you are even able to back up over your home network. In my example, I have a partition for MS-DOS on my main system. This partition is 250MB and holds about 150MB. I could go through this partition and mark which files or directories I want to back up, but it's not worth my time. So, I back up the whole thing. Also, by backing it all up, I have everything in case I need it.

To back up over a home network you need four programs on your system: tar, nfs, gzip and ssh.

If you want to have it done automatically, you also need cron. Luckily, all but NFS are part of a basic SuSE install, so I didn't have to worry about installing a lot of special programs. <>tar is the main program for archiving data. It puts all your files into one huge file without compressing them. For instance, if you have 1MB of data in different files, after tarring them, you'll have a new single file with 1MB of data, plus your original files. You can compress the tar file with gzip. This can compress your data up to 60% and is also a standard program of the basic SuSE install. You can do a tar/gzip in two different ways. Either tar your data first and then run gzip. Or add the z option to tar and get a compressed backup in one step.

A tape machine works differently from other mass-storage machines. When you want to write to or read from a hard drive, diskette or Zip drive, you must mount it first. A tape drive, on the other hand, is treated like a file. The problem here is that the tape drive is a device, and only root can open, read and write to devices. If you try to use the tape drive as a common user, you will get the following error message: user@penguin:~ > tar -tvf /dev/tapetar: /dev/tape: Cannot open: Permission deniedtar: Error is not recoverable: exiting now

So, let's look at how to make a backup of your DOS directory. This is all done under root. I have two systems: Penguin, which has the tape drive, and Tenn, which is what I want to back up. First, I mount Tenn's filesystem via NFS to Penguin. Next, I issue this command on Penguin: penguin:~ # tar cvzPf /dev/tape /Tenn/cdisk

Looking at my command, I am tarring everything on /Tenn/cdisk to the file /dev/tape. Let's look at this more closely. My source is a directory on my system Tenn. tar will copy everything under /Tenn/cdisk. The backup will be put on /dev/tape. In this example, tape is a device, but you also can back up to a file, for example, /dosbackup.july4.

In our example, there are five options (cvzPF): c means create and is required; v is for view, which will list each file as it is backed up (you can do without this, but I find that it's better to see what's happening); z is for compress, which will gzip your backup file (not required, but you can get more on the tape with it); P means keep paths, which is good to use when you back up over a network (the default is to strip away the first / of the backup, so that you can restore to any directory, making it easier to restore over the net; and finally, f is required and tells tar to use the argument after it, the "file" /dev/tape, for its output, instead of writing to stdout. So, that's all it takes to save a directory to tape. The next question is, "What is on my tape?"

There are several ways to list everything on your tape. Whatever way you choose, the options t and f are required. The t option means list. (You could also say --list, but let's not confuse everyone.) The f option means to use the following filename. This also means the f option should always be the last option, e.g., tf, not ft.

If you use only the tf options, you will get a listing that looks like the output below: penguin:~ # tar tf /dev/tape/Tenn/cdisk//Tenn/cdisk/1b0d6.vmm/Tenn/cdisk/cent//Tenn/cdisk/cent/hist//Tenn/cdisk/cent/hist/hist.mil

This is okay, but by adding the v option, you will get a listing that looks like the normal UNIX directory listing. This is helpful if you need to know the backup's date. You don't need to restore a file with an older date than the one on your system. Sometimes you will need to know the owner of the file too. penguin:~ # tar tvf /dev/tapedrwxrwxrwx root/root 0 1970-01-01 01:00:00 /Tenn/cdisk/-rw-rw-rw- root/root 0 1997-03-22 10:20:08 /Tenn/cdisk/1b0d6.vmmdrwxrwxrwx root/root 0 1999-07-28 15:05:30 /Tenn/cdisk/cent/drwxrwxrwx root/root 0 1999-07-28 15:05:40 /Tenn/cdisk/cent/hist/-rw-rw-rw- root/root 1761910 1998-10-05 17:22:36 /Tenn/cdisk/cent/hist/hist.mil

The main problem with the two previous examples is that you might get hundreds, if not thousands, of filenames. By default, a SuSE system installs almost 50,000 files on your computer, and you can't find the forest for the trees. However, if you know the filename, or even part of the name, you can use grep to search for it, like this: penguin:~ # tar tvf /dev/tape | grep linuxdrwxrwxrwx root/root 0 2000-03-10 12:21:36 /Tenn/cdisk/linuxtxt/drwxrwxrwx root/root 0 2000-03-10 14:45:28 /Tenn/cdisk/linuxtxt/cookbook/-rw-rw-rw- root/root 1477 2000-03-10 12:21:50 /Tenn/cdisk/linuxtxt/cookbook/execstat.txt-rw-rw-rw- root/root 5513 2000-03-10 12:22:02 /Tenn/cdisk/linuxtxt/cookbook/drowfact.txt-rw-rw-rw- root/root 1105 2000-03-10 14:45:28 /Tenn/cdisk/linuxtxt/cookbook/intro.txt

tar lists all the filenames, then Linux pipes (|) to the grep program. At that point, grep searches each line for the word or words you are looking for. When found, grep prints the line to the screen; otherwise, it's skipped. The grep program is powerful; read the man pages to learn more about it.

My system penguin takes about 15 minutes to read and grep a 525MB tape. If you are looking for three or four different files, you could fill a whole day simply reading the darn tape. To avoid this, build an index file on your hard disk by redirecting your tape listing, using the greater than symbol (>): penguin:~ # tar tvf /dev/tape > tape.list

At this point, you can do grep linux tape.list and get the same output as tar tvf /dev/tape | grep linux in just a few seconds, compared with 15 minutes: penguin:~ # time grep linux tape.listreal 0m0.444suser 0m0.040ssys 0m0.270spenguin:~ # time tar tvf /dev/tape | grep linuxreal 14m18.268suser 0m2.200ssys 0m5.690s

As you can see in the above examples, grepping the index file only took about half a second. On the other hand, grepping the tape itself took over 14 minutes.

How Do I Restore a Directory or File?

Once you've found the file that you want to restore, it's easy. The command below will restore a single file. If you are restoring over a home network, you should use the P option. This keeps all the /s in place. The default is to strip the first / from the file to be restored. This allows the system to put your restored file in whatever directory you happen to be in. For example, if you are in penguin's home directory and want to restore the file nt.tif to Tenn's cdisk, but happen to forget P, the file will end up in penguin's home directory like this: /root/penguin/cdisk/pictures/nt.tif, instead of being on Tenn where you want it: tar xvPf /dev/tape /Tenn/cdisk/pictures/nt.tif

The command tar xvPf /dev/tape /Tenn/cdisk

will restore an entire directory. A few things to watch for on both of these commands--first, if you already have a file by the same name, tar will overwrite it, even if the restored file is older than the file on the hard drive. Also, if a directory is not on the hard drive, tar will make one to restore too.

In general, doing backups is simple, and I promise someday they will save your bacon! Happy tarring!

Load Disqus comments

Firstwave Cloud