Automating Remote Backups

 in
Most home users don't think about backups till their disks crash. With a little bit of upfront work, you can automate your home backups and sleep easy at night.
User Data Backups

The system configuration backup script and the database backup script are run first to generate backups to user data space. Once complete, all data is ready to be backed up to the remote system with an rsync-based script:


#!/bin/bash
function checkRC
{
    rc=$1
    name=$2
    if [ $rc != 0 ]
    then
        echo "== $name failed with rsync rc=$rc =="
    fi
}
LOGIN=root@feynman
BRAHE=$LOGIN:/media/BackupDrive/feynman
if [ "$1" != "" ]
then
    BRAHE=$1
fi

The script includes a shell function to test rsync's return code and print an error message on failure. The front-end script redirects output from this script to a file, so error messages show up in the e-mailed backup report.

The default destination for the backup is configured at the start of the script. The first command-line argument can be used to override the default:

DIR1="/home/httpd"
DIR2="/home/mjhammel"
EXCL2=--exclude-from=/home/mjhammel/.rsync/local

The user data backup script is focused on directories. Unlike the other backup scripts, the list of items to back up are hard-coded in separate variables. Again, this is a brute-force method used for simplicity, because each directory to back up may have one or more sets of include and exclude arguments. Associative arrays could be used instead of the set of variables in a more generalized version of this script.

Notice that this configuration calls out individual directories under /home instead of backing up all of /home. The script from which this was pulled is used on a machine with development directories under /home that do not need to be backed up. Specifying /home and using an exclusion file is an alternative way of doing the same thing:

DATE=`date`
echo "== Backing up `uname -n` to $BRAHE."
echo "== Started @ $DATE "
echo "== Directory: $DIR1"
rsync -aq --safe-links $DIR1 $BRAHE
checkRC $? "$DIR1"

The first directory is backed up to the remote system. The -a option tells rsync to operate in archive mode, where rsync will do the following:

  • Recursively traverse the specified directory tree.

  • Copy symlinks as symlinks and not the files they point to.

  • Preserve owner, groups, permissions and modification times.

  • Preserve special files, such as device files.

The safe-links option tells rsync to ignore symbolic links that point to files outside the current directory tree. This way, restoration from the archive won't include symbolic links that may point to locations that no longer exist. The -q option tells rsync to run with as few non-error messages as possible:

echo "== Directory: $DIR2"
rsync -aq --safe-links $EXCL2 $DIR2 $BRAHE
checkRC $? "$DIR2"
DATE=`date`
echo "Backups complete @ $DATE"

The second directory tree is backed up using an exclusion list. This list is a file that specifies the files and directories within the current directory tree to be ignored by rsync. Entries in this file prefixed with a dash are excluded from the set of files and directories rsync will process. The three asterisks match anything below the specified directories:

- /mjhammel/.gvfs/***
- /mjhammel/Videos/***
- /mjhammel/brahe/***
- /mjhammel/iso/***

This example shows that no files under the Videos and iso directories will be included in the backup. It would be a poor use of disk space to back up files that exist in your home directory but that also can be retrieved from the Internet.

The brahe reference is a mountpoint for the home directory of an identical user ID on a remote system. This allows access to files under a login on another system simply by changing into the remote system's local mountpoint. But, there is no reason to back up those remote files on the local system, as that remote system has its own backup scripts configured.

The full version of this script includes an SSH-based verification that the remote system has the required external USB drive mounted and it is available for use. This allows the script to recognize that the remote system is misbehaving before wasting time trying to run a backup that would fail anyway.

Automation via Cron

The order in which these scripts is run is important. The system configuration file backup script and the database backup script can run in parallel but must complete before the user data backup script is run:


30 0 * * * /path/to/backup-db.pl
30 1 * * * /path/to/backup-configfiles.sh \
           /path/to/save/dir 2>&1 > /dev/null
30 2 * * * /path/to/backup-frontend.sh \
           /path/to/backup-data.sh

To pass arguments to backup-data.sh, enclose the entire command in double quotes:

30 2 * * * /path/to/backup-frontend.sh \
          "/path/to/backup-data.sh root@copernicus:/backups"

Each morning, the backup report is available for each machine that runs these scripts and can be reviewed to make sure the backups completed successfully. In practice, the most common problems encountered are related to unmounted or non-functioning drives, or to network outages that occur before or during the backup process.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState