rsync, Part I
Andrew Tridgell's rsync is a useful file-transfer tool, one that has no encryption support of its own but is easily “wrapped” (tunneled) by encryption tools such as SSH and Stunnel. What differentiates rsync (which, like scp, is based on rcp) is that it has the ability to perform differential downloads and uploads of files.
For example, if you wish to update your local copy of a 10MB file, and the newer version on the remote server differs in only three places totaling 150KB, rsync will automatically download only the differing 150KB (give or take a few KB) rather than the entire file. This functionality is provided by the rsync algorithm, invented by Andrew Tridgell and Paul Mackerras, which rapidly creates and compares rolling checksums of both files and thus determines which parts of the new file to download and add/replace on the old one.
Because this is a much more efficient use of the network, rsync is especially useful over slow network connections. It does not, however, have any performance advantage over rcp in copying files that are completely new to one side or the other of the transaction. By definition, differential copying requires that there be two files to compare.
In summary, rsync is by far the most intelligent file-transfer utility in common use, one that is both amenable to encrypted sessions and worth taking the trouble to figure out how. Using rsync securely is the focus of the remainder of this article.
rsync supports a long list of options, most of them relevant to specific aspects of maintaining software archives, mirrors and backups. Only those options directly relevant to security will be covered in depth here, but the rsync(8) man page will tell you anything you need to know about these other features.
Because Andrew Tridgell, rsync's original lead developer, is also one of the prime figures in the Samba Project, rsync's home page is part of the Samba web site, rsync.samba.org. That, of course, is the definitive source of all things rsync. The resources page, rsync.samba.org/resources.html, has links to some excellent off-site rsync documentation.
The latest rsync source code is available at rsync.samba.org/ftp/rsync, with binary packages for Debian, LinuxPPC and Red Hat Linux at rsync.samba.org/ftp/rsync/binaries. rsync is already considered a standard Linux tool and therefore is included in all popular Linux distributions; you probably needn't look further than the Linux installation CD-ROMs to find an rsync package for your system.
However, there are security bugs in the zlib implementation included in rsync prior to rsync v.2.5.4. These bugs are applicable regardless of the version of your system's shared zlib libraries. There is also an annoying bug in v2.5.4 itself, which causes rsync sometimes to copy whole files needlessly. I therefore recommend you run no version earlier than rsync v.2.5.5.
Happily, compiling rsync from source is fast and easy. Simply unzip and untar the archive, change your working directory to the top-level directory of the source code, type ./configure, and if this script finishes without errors, type make && make install.
Once rsync is installed, you can use it several ways. The first and most basic is to use rcp as the transport, which requires any host to which you connect to have the shell service enabled (i.e., in.rshd) in inetd.conf. Don't do this! The reason the Secure Shell was invented was because of a complete lack of support for strong authentication in the “r” services (rcp, rsh and rlogin), which led to their being used as entry points by many successful intruders over the years.
Therefore, I won't describe how to use rsync with rcp as its transport. However, you may wish to use this method between hosts on a trusted network; if so, ample information is available in both rsync's and in.rshd's respective man pages.
A much better way to use rsync than the rcp method is by specifying the Secure Shell as the transport. This requires that the remote host be running sshd and that the rsync command is present (and in the default paths) of both hosts. If you haven't set up sshd yet, do that first.
Suppose you have two hosts, near and far, and you wish to copy the local file thegoods.tgz to far's /home/near.backup directory, which you think may already contain an older version of thegoods.tgz. Assuming your user name, yodeldiva, exists on both systems, the transaction might look like this:
yodeldiva@near:~> rsync -vv -e ssh \ ./thegoods.tgz far:~
Let's dissect the command line. rsync has only one binary executable, rsync, which is used both as the client command and, optionally, as a dæmon. In this example, it's present on both near and far, but it runs on a dæmon on neither; sshd is acting as the listening dæmon on far.
The first rsync option in the above example is -vv, which is the nearly universal UNIX shorthand for “very verbose”. It's optional, but instructive. The second option is -e, with which you can specify an alternative to rsync's default remote-copy program rcp. Because rcp is the default, and because rcp and SSH are the only supported options, -e is used to specify SSH in practice.
Perhaps surprisingly, -e scp will not work, because prior to copying any data, rsync needs to pass a remote rsync command via SSH to generate and return rolling checksums on the remote file. In other words, rsync needs the full functionality of the ssh command to do its thing, so specify this rather than scp if you use the -e option.
After the options come rsync's actionable arguments, the local and remote files. The syntax for these is very similar to rcp's and scp's. If you immediately precede either filename with a colon, rsync will interpret the string preceding the colon as a remote host's name. If the user name you wish to use on the remote system is different from your local user name, you can specify it by immediately preceding the hostname with an @ sign and preceding that with your remote user name. In other words, the full rsync syntax for filenames is the following:
There must be at least two filenames. The right-most must be the destination file or path, and the others must be source files. Only one of these two may be remote, but both may be local (colon-less), which lets you perform local differential file copying—this is useful if, for example, you need to back up files from one local disk or partition to another.
The source file specified is ./thegoods.tgz, an ordinary local file path, and the destination is far:~, which translates to “my home directory on the server far”. If your user name on far is different from your local user name, say yodelerwannabe rather than yodeldiva, use the destination yodelerwannabe@far:~.