The Skinny on Backups and Data Recovery, Part 2
This week, I will go out on a limb and make a gross generalization. If you're ready, here it is. An amazing number of Linux users and admins out there have CD re-recordable drives and no tape drive. You happily burn CDs to make collections of your favourite tunes, but have you thought about using it to create backups of your data?
A quick step back. Here's the scoop. There are ways to do backups right, but I want to start from worst-case scenario to best case. Meaning, soon we will be talking about tape drives. Some people out there might argue that "tape drives" sounds like an archaic way to deal with data storage. I'll answer that challenge next week. For now, I'm going to pretend you have a CD-recordable unit on your system, and show you how to use it for backups.
The question of whether a CD-RW is a good backup choice is sometimes settled in this way. You can afford either a tape drive (sometimes more expensive than the CD-RW) or the tape. When our machines get used for both business and pleasure, as is often the case with home offices, we tend to lean in the direction of "I want both".
Making collections of favorite songs and burning extra Slackware or Debian CDs is usually done with something called cdrecord. cdrecord is not specifically Linux software, and will compile and run on a number of different platforms. In case you don't already have cdrecord (maybe you just got a CD-RW last night), you can pick up the latest copy at this address:
http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/cdrecord.html
The other thing you will need if you want to use your drive is a little package called mkisofs. Luckily, the latest version of cdrecord already includes mkisofs, but it is still available at this address:
ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/
I'm going to work on the premise that the reason we are having this discussion is you already have a working CD writer setup. You got a good deal; the writer makes great song collections, and now you think using that same device for backups is a great idea.
The problem with doing backups using CD writers and re-writers is that they were never intended for that. A raw image, based on a mirror of the data you intend to capture, is written to disk. You then write that image back to your CD or CD-RW. Essentially, you need twice the amount of free space you are trying to back up. In the case of a full CD at something like 650 meg, you must have 1.3 gigabytes of space. It's not necessarily the friendliest way to do backups (and certainly not the space-friendliest). Luckily, there's a way to cut the necessary space in half and still get your backups done. With a sufficiently fast system, you can simply pass the iso9660 image data that is being created directly to the cdrecord command. That means we do not need to have double the space available, the backup tree and an ISO image to then burn onto the CD.
One way to do this is to beef up last week's identity backup script. If you remember, we created a temporary directory with a hierarchical "mirror" of our important data, then backed up that smaller mirror to a diskette. With that script, I gave you a very small list of files and suggested that your choice of what's "important" may be different from mine. After all, you can fit roughly 1.4 meg on a single diskette, and my example identity backup used only 37K.
With the CD-RW, we can increase that size to roughly 650 megabytes, which may be all you need for the things that change day to day. Remember the catch (or half-catch, now): you still need that spare 650 megabytes into which you can recreate the structure you want to back up. We don't get away that easy. If you plan on backing up only 300 meg of data, then you'll need the 300 meg of space. The reason it's good to remember last week's example is that it is a micro-example of what we are about to do here.
We start by creating our mirror. In this case, it is a directory called /mnt/data1/data_backup. On my system, /mnt/data1 is a separate drive with a fair amount of free space. It's where I keep images of Linux CDs, which I then push on my friends in an effort to get them off that other operating system. On your system, the mirror will most likely be in a different location. Just make sure the space is available. Here is that script.
#!/bin/bash # script name : backup_to_cd # This script does a backup of important files onto the CD-RW # Marcel Gagne, 2000 # # NOTE: my "data mirror" is /mnt/data1/data_backup # echo "Starting by Blanking the data_backup area" rm -rf /mnt/data1/data_backup echo "Recreating the data_backup mirror ..." mkdir /mnt/data1/data_backup mkdir /mnt/data1/data_backup/usr mkdir /mnt/data1/data_backup/etc # echo "Backing up to data1 disk mirror area ..." cd / find home -print | cpio -pduvm /mnt/data1/data_backup find root -print | cpio -pduvm /mnt/data1/data_backup find usr/local -print | cpio -pduvm /mnt/data1/data_backup/usr # echo "Backing up system identity." cd /etc for ident_names in passwd group shadow profile bashrc sendmail.cw sendmail.cf hosts hosts.allow hosts.deny named.conf named.boot aliases do cp -v $ident_names /mnt/data1/data_backup/etc done find nsdata -print | cpio -pduvm /mnt/data1/data_backup/etc find sysconfig -print | cpio -pduvm /mnt/data1/data_backup/etc find mgetty+sendfax -print | cpio -pduvm /mnt/data1/data_backup/etc # echo "All files saved. Ready to begin CD copy." echo "Shall I blank the CD first?" read the_answer # cdrecord -blank=fast dev=3,0 # echo "Shall I start the CD burn now?" read the_answer # mkisofs -R /mnt/data1/data_backup | cdrecord -v dev=3,0 -
Notice that in my /mnt/data1/data-backup mirror, I am capturing /home, /usr/local and /root, none of which I was paying much attention to with my original identity_backup script. After creating our mirror, we immediately burn the data to our disk.
Yes? Ah. The reader in the back has a good point. I'm not really doing anything at those prompts for blanking and copying the CD (where it says read the_answer) other than pausing. Since the amount of data in my mirror can be pretty dynamic, not to mention downright huge, I want an opportunity to do a du -sk /mnt/data1/data_backup to verify that I'm staying within that 650 meg limit.
Back to the script. Since I am using a re-writable CD, I blank my CDs before starting. This is done with the line
cdrecord -blank=fast dev=3,0
I use the -blank=fast option to quickly erase the table of contents from the disk. You have the option of blanking the entire disk, but that can take a long time.
The real magic happens at the end of the script with "mkisofs" and "cdrecord". The -R option on mkisofs means I want the Rock Ridge extensions to be used. In other words, a UNIX file system with user and group information, long filename support, etc. That's about it. One other thing, though: just as in our last example, this script is meant to be a jumping-off point for your own CD backup. What I consider important in my backup may differ wildly from yours.
For the curious, here is the normal chain of events in creating a CD. You would have mkisofs write out an ISO9660 image, which would then get recorded on the CD. The final backup onto the CD would then happen in two passes. For instance, the commands would be:
# First we create the image based on a previously done "mirror" backup mkisofs -o /another_dir/image.iso -R /mnt/data1/data_backup # Now, we record the image to CD cdrecord -v dev=3,0 /another_dir/image.iso
Incidentally, if you have a few ISO images hanging around on your disk (spare Linux CDs for your friends, etc.), you can mount those images and navigate them as you would any CD filesystem. Here's how. Pretend your debian.iso image is sitting out there on your disk, and you want to look at it. First, you create a mount point (mkdir /mnt/debdist). Next, using this command, you can mount the image:
mount debian.iso -r -t iso9660 -o loop /mnt/debdist
The -r option means read-only. As you can see, it is very much like a CD filesystem.
Next time, I'll take up that challenge from earlier, and show you why tapes are still where it's at, and just how flexible those "old-fashioned" devices can be. On that note, we wrap up another week here at the corner. Until we chat again, remember that when you're down, only a good backup will get you back up.
email: ljeditors@ssc.com
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- RSS Feeds
- Reply to comment | Linux Journal
3 hours 18 min ago - Yeah, user namespaces are
4 hours 34 min ago - Cari Uang
8 hours 6 min ago - user namespaces
10 hours 59 min ago - yea
11 hours 25 min ago - One advantage with VMs
13 hours 53 min ago - about info
14 hours 27 min ago - info
14 hours 28 min ago - info
14 hours 29 min ago - info
14 hours 31 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Linux Disaster Recovery Software
Thanks for the useful information. Linux Backup and Disaster Recovery is a major issue for system admins.
Re: The Skinny on Backups and Data Recovery, Part 2
This is a great series on making good use of automating backups. I found that piping mkisofs into cdrecord caused errors though. I simply added an extra line to call cdrecord by itself and the problem is solved.
Re: The Skinny on Backups and Data Recovery, Part 2
I cant get past the "dir/foo and dir2/foo have the same Rock Ridge name". I have read the man page 100 times because Jorg Shilling made a comment that this is a _documented_ bug, but I cant find the documentation anywhere.
Can anyone tell me what version of mkisofs and what options will let me backup a unix filesystem that *may* contain the same named file in different directories?
Please feel free to email me at
ispconsultant AT yahoo DOT com
Thanks