A Linux-Based Automatic Backup System

December 1st, 2000 by Michael O'Brien in

A step-by-step procedure for establishing a backup system that will save time and money.
Your rating: None Average: 4 (5 votes)

Frequently people take computers for granted. This behavior becomes very dangerous when people rely on a computer to store and manipulate important data but fail to back up those data. If you are reading this, then you are probably aware of the need for reliable backups. However, you may work with people who are not, and your job may be seriously affected by a loss of their data.

I work in a scientific research group. Our laboratories are modern, and almost all of our data acquisition is performed by computers running Windows 95. In essence, our whole business is to acquire information that is stored on computers. Data loss can end up costing thousands of dollars, especially when one considers the salaries of all the people who helped produce that data.

To protect our group from data loss, I proposed an automatic, network-based backup system for our irreplaceable data. The costs were negligible (we had a 486/66 computer that was not in use and a 3GB hard disk that cost us little more than one hundred dollars). I went through several versions of this system over the past two years, starting with a Windows 95-based system and ending up with a fast, powerful Linux-based system. The current version is easy to implement, inexpensive, powerful and reliable. Assuming you have a networked Linux machine ready, you should be able to use this article to set up your own automatic backup system in a short time.

Necessary Tools

All the tools that are needed for the automatic backup system are included with most Linux distributions. The first is Samba, an excellent open-source package that allows UNIX-type systems to communicate with Windows-based systems over a TCP/IP network. The Linux version includes a utility called smbmount. It uses the smb file system kernel support unique to Linux, allowing any directories on Windows computers to be mounted to the Linux file system and manipulated as if they were on the Linux machine's hard disk. This will allow the archiving programs (in their update mode) to check to see if a file on the Windows machine needs to be backed up before it is transferred through the network, thereby reducing the network bandwidth requirements, CPU load and hard disk wear dramatically.

There are numerous archiving programs available for Linux, including tar, bzip2, and even the simple cp command. However, I chose to use tools from the open-source Info-ZIP project. These tools are included with most Linux distributions are available for various other platforms, are fast and small, and use an established file standard for Windows systems. Furthermore, the compression abilities of the Info-ZIP tools allow one to significantly reduce the size of the file archives on the Linux backup system.

Preliminary Steps

Network shares (a hard drive or any directory with all its subdirectories) must be set up on the Windows computers to be backed up. If file sharing is not already enabled, you can set it up from the Windows network control panel. Then, in the Windows Explorer, right click on the drive or folder you want to access from the network and choose the Sharing option from the pop-up menu. I recommend allowing read-only access so that crackers cannot alter or destroy your data if they somehow obtain your passwords. Make sure to record the names of these shares. It is a good idea to place the netbios names, DNS names and IP numbers of the Windows computers in your /etc/hosts file of the Linux machine (as directed by the comments in /etc/hosts), especially if your computers lie across different subnets.

Once this is done, you must prepare your Linux system to access and store the data. First create a mount point for the Windows shares by typing mkdir /mnt/smb. After that, you must decide where you will put the archived backups.

I put the backup files on a separate 1GB vfat (Windows) partition that remains unmounted at all times except when the actual backup processes are running. This way, the files are protected as much as possible from file system damage due to power outages, and the hard drive can be temporarily removed from the Linux computer and put into a Windows computer to facilitate recovery. In order to accommodate this, I created a mount point called /mnt/backups.

Scripts

A script is a text file containing commands that one would normally type at the Linux command prompt. You can use them to easily accomplish very complex tasks repeatedly. Making a script is as simple as typing the text into your favorite editor, saving it and then using the chmod u+x command on the file.

Listing 1 shows the script that backs up the DATA directory from the d_drive share on the computer named “higgins”. This script runs on my Linux computer, “magnum”, and is stored as the file root/backup/higgins.

Listing 1. DATA Directory Backup

The first line, while looking like a comment, actually instructs the computer to use bash to execute the script. Next comes all the shell variables that the main part of the script will use to back up the data on higgins. This practice of putting the case-specific values in variables at the beginning of the script allows the user to make new versions for new computers very quickly by copying the basic script and changing a few easily seen values. Listing 2 shows a different set of variables for a Windows 98 machine (“rick” with a shared C: drive) and a Windows NT machine (“tc” with a shared folder named “data”). Note how the Windows NT variables need to specify a user name and the password associated with that username.

Listing 2. Variables for the Windows Machines

The remaining lines actually do the work. The command export PASSWD puts the password in an environment variable that the smbmount program reads automatically. The smbumount command is executed next in case someone forgot to unmount an SMB share from the mount point. (If there is nothing there, smbumount returns a harmless error message and the script continues.) The smbmount program then attempts to mount the remote share. -N switch instructs it not to ask for a password to replace the value of the PASSWD environment variable. The -n switch communicates the username to smbmount.

An if statement checks to see if the specified backup files actually exist before doing any backup work in case the network may be down or the remote computer is switched off. In this case the script will terminate after making the mount point available again.

If the Linux machine can access the remote files, all archiving is done with the zip command. The -r switch is the standard recursion option, which makes zip go through every subfolder of the data directory. The -u puts zip in update mode, where it will only add or change files that are not already archived or those that have changed. The -v parameter instructs zip to verbosely show the names of every file it checks on the display—a useful option for troubleshooting.

After a backup script has been set up for each computer, you can make a simple script named master to call each of the backup scripts sequentially. An example of my master script is shown in Listing 3.

Listing 3. Master Script

Activating the System

After all the scripts have been written, you can put a symbolic link to the master script in one of the /etc/cron.d subdirectories so that the computer will take care of the backups automatically. For my setup, I typed ln -s /root/backup/master /etc/cron.d/weekly/master to set automatic weekly backups. You can back up on a daily basis if you need to since the update option of archiving utilities minimizes resource requirements.

The first usage of a backup script, however, will require a lot of network bandwidth and CPU time. Hence, you may want to consider running it for the first time by hand or with the at command at night.

Caveats

Five important points should be noted:

  1. Any shell script with passwords should be made unreadable by anyone but the owner by using the chmod go-r command.

  2. If your data is very sensitive, you need to set up adequate security measures to keep industrial spies from hacking into your Linux machine and stealing your centralized data. See the Linux security HOWTO for more information.

  3. The smbmount program tends to vary slightly across different distributions of Linux. Hence, if the scripts in this article don't work quite right for you, check out the man pages to see how your version of smbmount handles its command-line options.

  4. Users of the Windows computers must be taught to keep their data under a central directory, such as “users” or “data”, instead of several random directories spread across the hard drives. Some people are too lazy to move their files into a central directory, despite the fact that it takes only five seconds. You may have to actually move their files yourself before they will even start using the centralized directory. Remember, though, that these users may be the greatest threat to your organization in terms of data loss since they never bother to make backup copies of their own data.

  5. Finally, a hard drive is a very practical place to put the backups of irreplaceable data. My archive files use less than 400MB of hard disk and contain more than a 1.5GB worth of data. However, you may want to consider obtaining a large-capacity, removable drive for your Linux machine. With this, you can occasionally copy the archive files from your hard disk to a removable disk and take them home in case of physical destruction or theft of the machine.

Conclusion

A Linux-based network backup system for irreplaceable data files on many networked computers is inexpensive, reliable, easy to set up, trivial to expand and extremely practical. With just an hour of time you can potentially save your group or company many thousands of dollars in the case of a hard drive crash. Currently, my Pentium 150 workstation keeps archives of years of mission-critical data from eight computers spread across three buildings and two subnets. It takes me less than two minutes to add a new computer to the system due to the use of shell variables in the scripts.

This is the kind of task Linux was born to do. You can take an old surplus computer, make it “headless” with no keyboard or monitor and stick it somewhere in a closet where it will humbly do its work unseen. You can also run it on your personal workstation since the Linux tools can run in the background. You can set up an FTP server on the Linux machine on the fly if you need to restore files to a crashed computer or simply take the hard drive out and stick it inside a Windows machine. Since Linux has been designed to coexist with many different computers and operating systems, one can adapt the scripts to back up many different kinds of computers, including other Linux machines via NFS and even MacIntosh computers with the netatalk and hfs packages.

Resources

Michael O'Brien is a graduate student at the University of New Mexico where he studies optics. Computers are both a hobby and tool that end up helping him get his work done. He manages a small computer room in his spare time and likes to help Linux users on the Usenet newsgroups. He may be reached at mobrien@unm.edu.

__________________________


Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
PN's picture

Good Stuff

On September 11th, 2007 PN (not verified) says:

Hi O'brien. I found your tutorial very useful.

We are setting up a LINUX based Server architecture for backing up and providing services on our network and we have found this the easiest to follow tutorial.

Thanks a million

Anonymous's picture

Nice Tutorial !!!

On May 4th, 2008 Anonymous (not verified) says:

Very nice tutorial <

comment
I rather use rsync and a large harddrive instead of compressing the data to avoid corrupted archives

I'm building an AMD Athelon 1800+ Backup Server ,using ubuntu ,and I'll use your scripts
after modifications :)

thanx

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

July 2009, #183

News Flash: Linux Kernel 3.0 to include an on-the-go Expresso machine interface! Ok, maybe not, but Linux is definitely going mobile, from phones to e-readers. Find out more inside about Android, the Kindle 2, the Western Digital MyBook II, The Bug, and Indamixx (a portable recording studio). And if you've gone mobile and you been wanting more Emacs in your life then check out Conkeror.


To compliment the mobile we've got the stationary: parsing command line options with getopt, checking your Ruby code with metric_fu, and building a secure Squid proxy. How is this stationary you ask? What can we say? It's not. We just wanted to see if anybody actually read this part of the page :) .


All this and more, and all you have to do is get your hot sweaty hands on the latest copy of Linux Journal.





Read this issue