A Linux-Based Automatic Backup System
Frequently people take computers for granted. This behavior becomes very dangerous when people rely on a computer to store and manipulate important data but fail to back up those data. If you are reading this, then you are probably aware of the need for reliable backups. However, you may work with people who are not, and your job may be seriously affected by a loss of their data.
I work in a scientific research group. Our laboratories are modern, and almost all of our data acquisition is performed by computers running Windows 95. In essence, our whole business is to acquire information that is stored on computers. Data loss can end up costing thousands of dollars, especially when one considers the salaries of all the people who helped produce that data.
To protect our group from data loss, I proposed an automatic, network-based backup system for our irreplaceable data. The costs were negligible (we had a 486/66 computer that was not in use and a 3GB hard disk that cost us little more than one hundred dollars). I went through several versions of this system over the past two years, starting with a Windows 95-based system and ending up with a fast, powerful Linux-based system. The current version is easy to implement, inexpensive, powerful and reliable. Assuming you have a networked Linux machine ready, you should be able to use this article to set up your own automatic backup system in a short time.
All the tools that are needed for the automatic backup system are included with most Linux distributions. The first is Samba, an excellent open-source package that allows UNIX-type systems to communicate with Windows-based systems over a TCP/IP network. The Linux version includes a utility called smbmount. It uses the smb file system kernel support unique to Linux, allowing any directories on Windows computers to be mounted to the Linux file system and manipulated as if they were on the Linux machine's hard disk. This will allow the archiving programs (in their update mode) to check to see if a file on the Windows machine needs to be backed up before it is transferred through the network, thereby reducing the network bandwidth requirements, CPU load and hard disk wear dramatically.
There are numerous archiving programs available for Linux, including tar, bzip2, and even the simple cp command. However, I chose to use tools from the open-source Info-ZIP project. These tools are included with most Linux distributions are available for various other platforms, are fast and small, and use an established file standard for Windows systems. Furthermore, the compression abilities of the Info-ZIP tools allow one to significantly reduce the size of the file archives on the Linux backup system.
Network shares (a hard drive or any directory with all its subdirectories) must be set up on the Windows computers to be backed up. If file sharing is not already enabled, you can set it up from the Windows network control panel. Then, in the Windows Explorer, right click on the drive or folder you want to access from the network and choose the Sharing option from the pop-up menu. I recommend allowing read-only access so that crackers cannot alter or destroy your data if they somehow obtain your passwords. Make sure to record the names of these shares. It is a good idea to place the netbios names, DNS names and IP numbers of the Windows computers in your /etc/hosts file of the Linux machine (as directed by the comments in /etc/hosts), especially if your computers lie across different subnets.
Once this is done, you must prepare your Linux system to access and store the data. First create a mount point for the Windows shares by typing mkdir /mnt/smb. After that, you must decide where you will put the archived backups.
I put the backup files on a separate 1GB vfat (Windows) partition that remains unmounted at all times except when the actual backup processes are running. This way, the files are protected as much as possible from file system damage due to power outages, and the hard drive can be temporarily removed from the Linux computer and put into a Windows computer to facilitate recovery. In order to accommodate this, I created a mount point called /mnt/backups.
A script is a text file containing commands that one would normally type at the Linux command prompt. You can use them to easily accomplish very complex tasks repeatedly. Making a script is as simple as typing the text into your favorite editor, saving it and then using the chmod u+x command on the file.
Listing 1 shows the script that backs up the DATA directory from the d_drive share on the computer named “higgins”. This script runs on my Linux computer, “magnum”, and is stored as the file root/backup/higgins.
The first line, while looking like a comment, actually instructs the computer to use bash to execute the script. Next comes all the shell variables that the main part of the script will use to back up the data on higgins. This practice of putting the case-specific values in variables at the beginning of the script allows the user to make new versions for new computers very quickly by copying the basic script and changing a few easily seen values. Listing 2 shows a different set of variables for a Windows 98 machine (“rick” with a shared C: drive) and a Windows NT machine (“tc” with a shared folder named “data”). Note how the Windows NT variables need to specify a user name and the password associated with that username.
The remaining lines actually do the work. The command export PASSWD puts the password in an environment variable that the smbmount program reads automatically. The smbumount command is executed next in case someone forgot to unmount an SMB share from the mount point. (If there is nothing there, smbumount returns a harmless error message and the script continues.) The smbmount program then attempts to mount the remote share. -N switch instructs it not to ask for a password to replace the value of the PASSWD environment variable. The -n switch communicates the username to smbmount.
An if statement checks to see if the specified backup files actually exist before doing any backup work in case the network may be down or the remote computer is switched off. In this case the script will terminate after making the mount point available again.
If the Linux machine can access the remote files, all archiving is done with the zip command. The -r switch is the standard recursion option, which makes zip go through every subfolder of the data directory. The -u puts zip in update mode, where it will only add or change files that are not already archived or those that have changed. The -v parameter instructs zip to verbosely show the names of every file it checks on the display—a useful option for troubleshooting.
After a backup script has been set up for each computer, you can make a simple script named master to call each of the backup scripts sequentially. An example of my master script is shown in Listing 3.