A Linux-Based Automatic Backup System

A step-by-step procedure for establishing a backup system that will save time and money.

Frequently people take computers for granted. This behavior becomes very dangerous when people rely on a computer to store and manipulate important data but fail to back up those data. If you are reading this, then you are probably aware of the need for reliable backups. However, you may work with people who are not, and your job may be seriously affected by a loss of their data.

I work in a scientific research group. Our laboratories are modern, and almost all of our data acquisition is performed by computers running Windows 95. In essence, our whole business is to acquire information that is stored on computers. Data loss can end up costing thousands of dollars, especially when one considers the salaries of all the people who helped produce that data.

To protect our group from data loss, I proposed an automatic, network-based backup system for our irreplaceable data. The costs were negligible (we had a 486/66 computer that was not in use and a 3GB hard disk that cost us little more than one hundred dollars). I went through several versions of this system over the past two years, starting with a Windows 95-based system and ending up with a fast, powerful Linux-based system. The current version is easy to implement, inexpensive, powerful and reliable. Assuming you have a networked Linux machine ready, you should be able to use this article to set up your own automatic backup system in a short time.

Necessary Tools

All the tools that are needed for the automatic backup system are included with most Linux distributions. The first is Samba, an excellent open-source package that allows UNIX-type systems to communicate with Windows-based systems over a TCP/IP network. The Linux version includes a utility called smbmount. It uses the smb file system kernel support unique to Linux, allowing any directories on Windows computers to be mounted to the Linux file system and manipulated as if they were on the Linux machine's hard disk. This will allow the archiving programs (in their update mode) to check to see if a file on the Windows machine needs to be backed up before it is transferred through the network, thereby reducing the network bandwidth requirements, CPU load and hard disk wear dramatically.

There are numerous archiving programs available for Linux, including tar, bzip2, and even the simple cp command. However, I chose to use tools from the open-source Info-ZIP project. These tools are included with most Linux distributions are available for various other platforms, are fast and small, and use an established file standard for Windows systems. Furthermore, the compression abilities of the Info-ZIP tools allow one to significantly reduce the size of the file archives on the Linux backup system.

Preliminary Steps

Network shares (a hard drive or any directory with all its subdirectories) must be set up on the Windows computers to be backed up. If file sharing is not already enabled, you can set it up from the Windows network control panel. Then, in the Windows Explorer, right click on the drive or folder you want to access from the network and choose the Sharing option from the pop-up menu. I recommend allowing read-only access so that crackers cannot alter or destroy your data if they somehow obtain your passwords. Make sure to record the names of these shares. It is a good idea to place the netbios names, DNS names and IP numbers of the Windows computers in your /etc/hosts file of the Linux machine (as directed by the comments in /etc/hosts), especially if your computers lie across different subnets.

Once this is done, you must prepare your Linux system to access and store the data. First create a mount point for the Windows shares by typing mkdir /mnt/smb. After that, you must decide where you will put the archived backups.

I put the backup files on a separate 1GB vfat (Windows) partition that remains unmounted at all times except when the actual backup processes are running. This way, the files are protected as much as possible from file system damage due to power outages, and the hard drive can be temporarily removed from the Linux computer and put into a Windows computer to facilitate recovery. In order to accommodate this, I created a mount point called /mnt/backups.

Scripts

A script is a text file containing commands that one would normally type at the Linux command prompt. You can use them to easily accomplish very complex tasks repeatedly. Making a script is as simple as typing the text into your favorite editor, saving it and then using the chmod u+x command on the file.

Listing 1 shows the script that backs up the DATA directory from the d_drive share on the computer named “higgins”. This script runs on my Linux computer, “magnum”, and is stored as the file root/backup/higgins.

Listing 1. DATA Directory Backup

The first line, while looking like a comment, actually instructs the computer to use bash to execute the script. Next comes all the shell variables that the main part of the script will use to back up the data on higgins. This practice of putting the case-specific values in variables at the beginning of the script allows the user to make new versions for new computers very quickly by copying the basic script and changing a few easily seen values. Listing 2 shows a different set of variables for a Windows 98 machine (“rick” with a shared C: drive) and a Windows NT machine (“tc” with a shared folder named “data”). Note how the Windows NT variables need to specify a user name and the password associated with that username.

Listing 2. Variables for the Windows Machines

The remaining lines actually do the work. The command export PASSWD puts the password in an environment variable that the smbmount program reads automatically. The smbumount command is executed next in case someone forgot to unmount an SMB share from the mount point. (If there is nothing there, smbumount returns a harmless error message and the script continues.) The smbmount program then attempts to mount the remote share. -N switch instructs it not to ask for a password to replace the value of the PASSWD environment variable. The -n switch communicates the username to smbmount.

An if statement checks to see if the specified backup files actually exist before doing any backup work in case the network may be down or the remote computer is switched off. In this case the script will terminate after making the mount point available again.

If the Linux machine can access the remote files, all archiving is done with the zip command. The -r switch is the standard recursion option, which makes zip go through every subfolder of the data directory. The -u puts zip in update mode, where it will only add or change files that are not already archived or those that have changed. The -v parameter instructs zip to verbosely show the names of every file it checks on the display—a useful option for troubleshooting.

After a backup script has been set up for each computer, you can make a simple script named master to call each of the backup scripts sequentially. An example of my master script is shown in Listing 3.

Listing 3. Master Script

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

...

Rubyoxy's picture

Last time I checked, you needed a Windows CD to create the Windows PE one. So, what if my copy of Windows is the OEM one that came with the computer?

Good Stuff

PN's picture

Hi O'brien. I found your tutorial very useful.

We are setting up a LINUX based Server architecture for backing up and providing services on our network and we have found this the easiest to follow tutorial.

Thanks a million

Nice Tutorial !!!

Anonymous's picture

Very nice tutorial <

comment
I rather use rsync and a large harddrive instead of compressing the data to avoid corrupted archives

I'm building an AMD Athelon 1800+ Backup Server ,using ubuntu ,and I'll use your scripts
after modifications :)

thanx

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState