Cooking with Linux - Mirror, Mirror, of It All

Make simple backups and keep every copy of your Web or FTP site up to date with some standard tools that probably are already on your system.

François, what are you doing? When I asked you to mirror our Web sites, I did not mean that you should hold a mirror up to the screen. You can be very silly, mon ami. What I meant was that you should make a copy of our Web sites onto that other machine. François, what are you looking at? Ah, our guests have arrived! Why did you not tell me? Welcome, mes amis, to Chez Marcel, home of fine Linux fare and exceptional wines.

Speaking of wine, François! To the wine cellar, immédiatement. Please bring back the 1999 California Stag's Leap District Cabernet Sauvignon. This bold, smooth wine is the perfect mirror to today's menu. As you know, mes amis, the theme of this issue is system administration. On today's menu, we are going to sample a number of alternatives for mirroring data. The reasons for mirroring data are many. The obvious first reason is the not altogether sexy but extremely important subject of backups. Other reasons include creating mirrors of FTP sites for local network updates, such as your own RPM update repository, or mirroring Web sites for fast, off-line reading.

Many people who do regular backups are doing them to a disk on one of their other machines. Others still are backing up to a second disk on the same machine. Given that an extra hard drive added to a system is extremely inexpensive these days and high-capacity tape drives can cost substantially more, it isn't that unusual to find this kind of solution being used.

Backing up from one disk to the other, or creating a mirror of your data, can be as simple as doing a recursive copy using cp. For instance, if I wanted to copy everything in my home directory to a second disk with a lot of space, I might do the following:

cp -rfupv /home/mgagne /disk2/

As you probably expect, the -r option indicates a recursive copy (all the subdirectories), and the -v tells the command to be verbose. Because I don't want to be warned about each file being overwritten, I add -f to force the copy; the -p ensures that permissions are saved properly as well. Finally, the -u option tells the cp command to copy only files that have been updated. This speeds up the process on subsequent copies.

It all works very well, but copying from machine to machine requires a few extra steps. With your Linux system, you actually have a lot of tools at your disposal beyond the humble cp. For starters, if you want to copy or back up an entire Web site, try the wget command, originally written by Hrvoje Niksic:

wget -m http://www.websitename.dom

Starting at the top of your chosen Web site, wget walks through the entire site, saving all appropriate HTML files and images. The -m in this case means mirror, but it also encompasses several other options, specifically -r, -N, -l inf and -nr. These options tell wget to do a recursive fetch, turn on timestamps, allow for an infinite number of levels and not to remove the FTP directory .listing files, respectively.

All files on the Web site are saved in a local directory with the same name as the Web site. In the example above, that would be www.websitename.dom. Add a new file to your Web server, run the command again and only that new file is transferred, thus making the job of keeping things up to date that much faster.

This is a great tool for its intended purpose, but its primary function is to deal with Web sites. It is possible, however, to use wget to download from FTP servers as well. If you are transferring from anonymous sites, the format is almost identical to the one used to mirror a Web site:

wget -m ftp://ftp.ftpsitename.dom

If, on the other hand, you want to back up a user directory where a user name and password are required, you need to be a little fancier:


wget -m ftp://username:password@ftp.sitename.dom

This approach has a couple downsides. First, your password is sent across the network in plain text, which may not be a big deal depending on how much you trust your network. In a pinch, you could do a recursive secure copy with the scp command. Because scp is part of OpenSSH, you have the advantage of knowing that you are using secure, encrypted file transfers. Pretend that you want to copy your whole Web site, starting from the Apache server root. It would look something like this:


scp -rpv /var/www root@remote_host:/mnt/backupdir

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Cooking with Linux: Mirror, Mirror, of It All

Anonymous's picture

Hi,

> As it copies, it maintains permissions and modification dates and times, and it does it fast.

I do not see any way to have ftpcopy to keep file mode; all mirrored files are 0644. How do you made that working?

Lol

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState