Tape Machines

Get started making tape backups with tar

Tape drives are slow compared to disk drives, but can back up large quantities of data. Today, it's normal to have a 20GB hard drive on a home computer, and hard drives of 160GB or more can be found on our PCs. If you tried to back up such a 20GB hard drive on the common 1.44MB diskettes, you would need about 14,223 diskettes--not very practical. A tape drive, on the other hand, can store up to 40GB for about $15 per tape. The 14,223 diskettes would cost over $2,100 for the cheapest kind. <o>Tape machines are comparatively slow, therefore, I start my backups at bedtime, so they have them done the next morning. In fact, by leaving a tape in the drive, you can have cron do the job automagically whenever you want.

Some say that tapes currently are useless because the price of hard drives has dropped so much that it's as cheap to buy another hard drive for backup. For somebody who runs his own ISP, this might be true. But for someone with only a single system, or a small home network with only a couple users, I think this is wrong. An 80GB tape costs about the same as an 80GB hard drive. But to backup an 80GB hard drive on a home network, you don't need an 80GB tape. When you stop and look at your hard drive, at least 50% of the data is system programming that is already stored on your distribution's CDs. This data does not need to be backed up. The hard drive of my home network's main system has 6.5GB of space. It is about 70% full, which seems to be normal for a home system. Of this 4.5GB of data, at least 3.2GB are system and program files that came with my distribution or are from other CDs I have. Therefore, I have only about 1.3GB of data to worry about. These data files are, for example, e-mails, MP3s, configuration files, etc. Even on very old 512MB tapes, it would only take two or three tapes to copy everything necessary.

So far, I've recycled four tape drives, all of them SCSI types. I decided to set up one of the SCSI tapes on a node of my home network to learn how to back up data on my machines. On Penguin, Foe of Batman (I name my systems after famous penguins), I added an old SCSI 1510 controller card with a well-used QIC-1000 tape drive and a small 200MB hard drive (I added the hard drive only because I had it--I'm a pack rat at heart). The tape drive came with ten 525MB tapes. I recycled all this from about four different sources.

Hardware

To set up internal SCSI hardware, three things are needed: the device, a cable and the SCSI card. Most SCSI cards can run up to seven devices, normally hard drives, CD-ROM drives, tape drives, scanners or Zip drives (SCSI Zip drives are not common). Each device has an ID number, for example, my tape drive is number 3, and the hard drive is 0. The numbers 0 through 6 are available for use. On internal SCSI devices, you will find jumpers to set the number. You need to check your device's manual for the correct jumper. On external devices (for example, scanners), you will find small dials with 0 to 6 on them. Simply dial in the correct number.

You can choose any number, but normally a tape drive is number 3. In fact, the factory often will pre-set the number that is common for a particular device. So, when you have a new device without the manual and cannot figure out how to set the jumpers, just plug it in. Unless you have two (or more) of the same device, it should work without a problem. I don't know why, but not one of my four recycled tape drives has the jumpers marked. SCSI has eight ID numbers: 0 to 7. Zero is reserved for the hard drive; 1 through 4 are open; 5 is reserved for a CD-ROM drive; 6 is open; and 7 is exclusively for the SCSI card.

The most common SCSI setup problem is getting the termination wrong. Terminators are electrical "traps" that keep signals from being reflected back along the SCSI cable. In order to work correctly, there needs to be one terminator at each end of the cable, and no terminators in between. SCSI cards typically have a terminator built in, so you can either turn on termination on the tape drive (if it is the end device) or add a terminator after it on the SCSI cable.

Software

Using the UNIX idea of several tools for getting a job done, you are even able to back up over your home network. In my example, I have a partition for MS-DOS on my main system. This partition is 250MB and holds about 150MB. I could go through this partition and mark which files or directories I want to back up, but it's not worth my time. So, I back up the whole thing. Also, by backing it all up, I have everything in case I need it.

To back up over a home network you need four programs on your system: tar, nfs, gzip and ssh.

If you want to have it done automatically, you also need cron. Luckily, all but NFS are part of a basic SuSE install, so I didn't have to worry about installing a lot of special programs. <>tar is the main program for archiving data. It puts all your files into one huge file without compressing them. For instance, if you have 1MB of data in different files, after tarring them, you'll have a new single file with 1MB of data, plus your original files. You can compress the tar file with gzip. This can compress your data up to 60% and is also a standard program of the basic SuSE install. You can do a tar/gzip in two different ways. Either tar your data first and then run gzip. Or add the z option to tar and get a compressed backup in one step.

A tape machine works differently from other mass-storage machines. When you want to write to or read from a hard drive, diskette or Zip drive, you must mount it first. A tape drive, on the other hand, is treated like a file. The problem here is that the tape drive is a device, and only root can open, read and write to devices. If you try to use the tape drive as a common user, you will get the following error message: user@penguin:~ > tar -tvf /dev/tapetar: /dev/tape: Cannot open: Permission deniedtar: Error is not recoverable: exiting now

So, let's look at how to make a backup of your DOS directory. This is all done under root. I have two systems: Penguin, which has the tape drive, and Tenn, which is what I want to back up. First, I mount Tenn's filesystem via NFS to Penguin. Next, I issue this command on Penguin: penguin:~ # tar cvzPf /dev/tape /Tenn/cdisk

Looking at my command, I am tarring everything on /Tenn/cdisk to the file /dev/tape. Let's look at this more closely. My source is a directory on my system Tenn. tar will copy everything under /Tenn/cdisk. The backup will be put on /dev/tape. In this example, tape is a device, but you also can back up to a file, for example, /dosbackup.july4.

In our example, there are five options (cvzPF): c means create and is required; v is for view, which will list each file as it is backed up (you can do without this, but I find that it's better to see what's happening); z is for compress, which will gzip your backup file (not required, but you can get more on the tape with it); P means keep paths, which is good to use when you back up over a network (the default is to strip away the first / of the backup, so that you can restore to any directory, making it easier to restore over the net; and finally, f is required and tells tar to use the argument after it, the "file" /dev/tape, for its output, instead of writing to stdout. So, that's all it takes to save a directory to tape. The next question is, "What is on my tape?"

There are several ways to list everything on your tape. Whatever way you choose, the options t and f are required. The t option means list. (You could also say --list, but let's not confuse everyone.) The f option means to use the following filename. This also means the f option should always be the last option, e.g., tf, not ft.

If you use only the tf options, you will get a listing that looks like the output below: penguin:~ # tar tf /dev/tape/Tenn/cdisk//Tenn/cdisk/1b0d6.vmm/Tenn/cdisk/cent//Tenn/cdisk/cent/hist//Tenn/cdisk/cent/hist/hist.mil

This is okay, but by adding the v option, you will get a listing that looks like the normal UNIX directory listing. This is helpful if you need to know the backup's date. You don't need to restore a file with an older date than the one on your system. Sometimes you will need to know the owner of the file too. penguin:~ # tar tvf /dev/tapedrwxrwxrwx root/root 0 1970-01-01 01:00:00 /Tenn/cdisk/-rw-rw-rw- root/root 0 1997-03-22 10:20:08 /Tenn/cdisk/1b0d6.vmmdrwxrwxrwx root/root 0 1999-07-28 15:05:30 /Tenn/cdisk/cent/drwxrwxrwx root/root 0 1999-07-28 15:05:40 /Tenn/cdisk/cent/hist/-rw-rw-rw- root/root 1761910 1998-10-05 17:22:36 /Tenn/cdisk/cent/hist/hist.mil

The main problem with the two previous examples is that you might get hundreds, if not thousands, of filenames. By default, a SuSE system installs almost 50,000 files on your computer, and you can't find the forest for the trees. However, if you know the filename, or even part of the name, you can use grep to search for it, like this: penguin:~ # tar tvf /dev/tape | grep linuxdrwxrwxrwx root/root 0 2000-03-10 12:21:36 /Tenn/cdisk/linuxtxt/drwxrwxrwx root/root 0 2000-03-10 14:45:28 /Tenn/cdisk/linuxtxt/cookbook/-rw-rw-rw- root/root 1477 2000-03-10 12:21:50 /Tenn/cdisk/linuxtxt/cookbook/execstat.txt-rw-rw-rw- root/root 5513 2000-03-10 12:22:02 /Tenn/cdisk/linuxtxt/cookbook/drowfact.txt-rw-rw-rw- root/root 1105 2000-03-10 14:45:28 /Tenn/cdisk/linuxtxt/cookbook/intro.txt

tar lists all the filenames, then Linux pipes (|) to the grep program. At that point, grep searches each line for the word or words you are looking for. When found, grep prints the line to the screen; otherwise, it's skipped. The grep program is powerful; read the man pages to learn more about it.

My system penguin takes about 15 minutes to read and grep a 525MB tape. If you are looking for three or four different files, you could fill a whole day simply reading the darn tape. To avoid this, build an index file on your hard disk by redirecting your tape listing, using the greater than symbol (>): penguin:~ # tar tvf /dev/tape > tape.list

At this point, you can do grep linux tape.list and get the same output as tar tvf /dev/tape | grep linux in just a few seconds, compared with 15 minutes: penguin:~ # time grep linux tape.listreal 0m0.444suser 0m0.040ssys 0m0.270spenguin:~ # time tar tvf /dev/tape | grep linuxreal 14m18.268suser 0m2.200ssys 0m5.690s

As you can see in the above examples, grepping the index file only took about half a second. On the other hand, grepping the tape itself took over 14 minutes.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Tape Machines

Anonymous's picture

I have been using tar and tapes on small systems since 1985. The biggest problem with tar is that it does not verify the tape. There is nothing worse, IMHO, than needing that backup and finding out that it is unreadable.

That is why I have dropped tar in favor of more robust backup programs like 'bru' (my personal favorite).

MikeD

Re: Tape Machines

Anonymous's picture

The GNU version of tar has the "--verify" command argument to

allow you to verify the archive after it is written. Or you can always read it back immediately after it is written.

Re: Tape Machines

Anonymous's picture

What about dump and restore ? You can verify the archive and you can interactivly walk the tape looking for the files you require. I have used them for years on many different *nix systems without issues.

tape restore, how will it work..?

khaleel's picture

Hi,

we've been using dump and restore for unix backups and restore through tapes. No problems so far. But couldn't find out a procedure to restore linux through tapes. Dump works well for backup, but couldn't restore. it comes up with problems relating to LVM's. Can anybody please help me to find out the actual procedure to follow to restore redhat linux 4.

Thanx in advance

Re: Tape Machines

Anonymous's picture

The problem with BRU is that it is expensive. You might want to

take a look at Bacula on SourceForge. It has pretty much the

same features except that it comes with the source code and

a GPL license. :-)

Re: Tape Machines

Anonymous's picture

Amen, brother. Got the old "Unexpected EOF" from a tape/tar restore just this week, trying to recover my OWN home directory after a hardware blowout. Man, was I bummed out . . . . At least I have a CD backup from a few months ago.

Re: Tape Machines are not dead

Anonymous's picture

After reading your article I find myself questioning if you meant things the other way around. For instance an ISP would use a tape drive as a backup source where as a home user should be using something a little more cost effective such as a CD or DVD burner. You mentioned an 80GB tape cost the same as a Hard Drive but you didn't specify IDE or SCSI significant difference), and further more a tape drive capable of using an 80GB tape would cost more than an entire PC and then some.

I notice you give inctructions on setting up cron jobs to complete backups which can also be used to backup data onto CD burners etc. not just tapes.

There are many arguments to both sides of the backup media but it eventually all comes down to specific needs. In reading some of the commentary to your article someone mentions RAID which is a very useful tool for data recovery in the event of a failed hard drive or controller but should not be relied upon as a backup.

In closing I will say that theses are great topics to discuss and will be checking up to see where this discussion goes.

Thanks for hearing my thoughts,

Patrick M. Dorn

Re: Tape Machines

Anonymous's picture

Another way you can protect your data: RAID. As you said, disks are cheap. You still need to do backups so you can have off site media.

If you need archival media, be careful. How long will that travan drive be around? Can you get a replacement to read your archives?

Re: Tape Machines

Anonymous's picture

RAID doesn't protect against accidents, eg. "rm -rf ~ /to_trash" (there's an accidental space betwen ~ and /), worms, malware, heavy lightning (burning the whole PC) fire and thieft.

Only tapes provides good backup, but backups are not popular amongst home users, so we don't have cheap and good backup units available (an OnStream drive cost about the same as a cheap PC).

Re: Tape Machines

Anonymous's picture

Great point! What I do as a home user/experimenter, is use xcdroast-.098alpha10 and cd-rw media to make data backups of my primary work space, /home. This gives me some degree of comfort, and not only that, it works!

Re: Tape Machines - To Tape or Not to Tape. ???

Anonymous's picture

I run a small home network as well. Of the 5 machines on my network, the lowest capacity machine has only 100gig HD space. All the others have more. I'm approaching a terrabyte of total HD storage with about 60% utilization.

My environment doesn't lend itself well to tape backup in the capacities I need. So my whole approach to backup is a bit different.

1) As you rightly noted in your article, don't back up anything you already have a copy of. For me, these are usually original CDs of software I've either purchased or downloaded (GPLed). I'll usually make an archival CD copy on installation.

2) When creating new static data, make backup copies immediately. What is 'static data'? One example is my entire ripped 500 cd music collection. As I ripped the CDs to one of my hard drives, I burned copies of the resultant MP3 files onto archival CDs. Another are the several ISO images I have on my network through an ISO image server.

3) Backup 'critical data' from all systems to a dedicated drive. This data typically includes all of my email and personal files. The funny thing is that the grand some total of this information is less than 2 CDs worth. So I back up the back up HD to CD once a week.

In my scheme of things, tape is just too much of a hassle and expense. CDs are cheap and sometimes even free (with rebates). And the data which needs to be backed up regularly fits comfortably on 2 CDs.

BTW - I'm going through the entire CD ripping (that's right - all 500+ of my CDs) process again. This time, I'm ripping to Ogg-Vorbis format at 192K sampling rate.

The RIAA and it's members are fools.

Re: Tape Machines

Anonymous's picture

The ole tar command, its been around forever hasn't it?

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix