Cooking with Linux - Mirror, Mirror, of It All
François, what are you doing? When I asked you to mirror our Web sites, I did not mean that you should hold a mirror up to the screen. You can be very silly, mon ami. What I meant was that you should make a copy of our Web sites onto that other machine. François, what are you looking at? Ah, our guests have arrived! Why did you not tell me? Welcome, mes amis, to Chez Marcel, home of fine Linux fare and exceptional wines.
Speaking of wine, François! To the wine cellar, immédiatement. Please bring back the 1999 California Stag's Leap District Cabernet Sauvignon. This bold, smooth wine is the perfect mirror to today's menu. As you know, mes amis, the theme of this issue is system administration. On today's menu, we are going to sample a number of alternatives for mirroring data. The reasons for mirroring data are many. The obvious first reason is the not altogether sexy but extremely important subject of backups. Other reasons include creating mirrors of FTP sites for local network updates, such as your own RPM update repository, or mirroring Web sites for fast, off-line reading.
Many people who do regular backups are doing them to a disk on one of their other machines. Others still are backing up to a second disk on the same machine. Given that an extra hard drive added to a system is extremely inexpensive these days and high-capacity tape drives can cost substantially more, it isn't that unusual to find this kind of solution being used.
Backing up from one disk to the other, or creating a mirror of your data, can be as simple as doing a recursive copy using cp. For instance, if I wanted to copy everything in my home directory to a second disk with a lot of space, I might do the following:
cp -rfupv /home/mgagne /disk2/
As you probably expect, the -r option indicates a recursive copy (all the subdirectories), and the -v tells the command to be verbose. Because I don't want to be warned about each file being overwritten, I add -f to force the copy; the -p ensures that permissions are saved properly as well. Finally, the -u option tells the cp command to copy only files that have been updated. This speeds up the process on subsequent copies.
It all works very well, but copying from machine to machine requires a few extra steps. With your Linux system, you actually have a lot of tools at your disposal beyond the humble cp. For starters, if you want to copy or back up an entire Web site, try the wget command, originally written by Hrvoje Niksic:
wget -m http://www.websitename.dom
Starting at the top of your chosen Web site, wget walks through the entire site, saving all appropriate HTML files and images. The -m in this case means mirror, but it also encompasses several other options, specifically -r, -N, -l inf and -nr. These options tell wget to do a recursive fetch, turn on timestamps, allow for an infinite number of levels and not to remove the FTP directory .listing files, respectively.
All files on the Web site are saved in a local directory with the same name as the Web site. In the example above, that would be www.websitename.dom. Add a new file to your Web server, run the command again and only that new file is transferred, thus making the job of keeping things up to date that much faster.
This is a great tool for its intended purpose, but its primary function is to deal with Web sites. It is possible, however, to use wget to download from FTP servers as well. If you are transferring from anonymous sites, the format is almost identical to the one used to mirror a Web site:
wget -m ftp://ftp.ftpsitename.dom
If, on the other hand, you want to back up a user directory where a user name and password are required, you need to be a little fancier:
wget -m ftp://username:email@example.com
This approach has a couple downsides. First, your password is sent across the network in plain text, which may not be a big deal depending on how much you trust your network. In a pinch, you could do a recursive secure copy with the scp command. Because scp is part of OpenSSH, you have the advantage of knowing that you are using secure, encrypted file transfers. Pretend that you want to copy your whole Web site, starting from the Apache server root. It would look something like this:
scp -rpv /var/www root@remote_host:/mnt/backupdir