A Linux-Based Automatic Backup System
Frequently people take computers for granted. This behavior becomes very dangerous when people rely on a computer to store and manipulate important data but fail to back up those data. If you are reading this, then you are probably aware of the need for reliable backups. However, you may work with people who are not, and your job may be seriously affected by a loss of their data.
I work in a scientific research group. Our laboratories are modern, and almost all of our data acquisition is performed by computers running Windows 95. In essence, our whole business is to acquire information that is stored on computers. Data loss can end up costing thousands of dollars, especially when one considers the salaries of all the people who helped produce that data.
To protect our group from data loss, I proposed an automatic, network-based backup system for our irreplaceable data. The costs were negligible (we had a 486/66 computer that was not in use and a 3GB hard disk that cost us little more than one hundred dollars). I went through several versions of this system over the past two years, starting with a Windows 95-based system and ending up with a fast, powerful Linux-based system. The current version is easy to implement, inexpensive, powerful and reliable. Assuming you have a networked Linux machine ready, you should be able to use this article to set up your own automatic backup system in a short time.
All the tools that are needed for the automatic backup system are included with most Linux distributions. The first is Samba, an excellent open-source package that allows UNIX-type systems to communicate with Windows-based systems over a TCP/IP network. The Linux version includes a utility called smbmount. It uses the smb file system kernel support unique to Linux, allowing any directories on Windows computers to be mounted to the Linux file system and manipulated as if they were on the Linux machine's hard disk. This will allow the archiving programs (in their update mode) to check to see if a file on the Windows machine needs to be backed up before it is transferred through the network, thereby reducing the network bandwidth requirements, CPU load and hard disk wear dramatically.
There are numerous archiving programs available for Linux, including tar, bzip2, and even the simple cp command. However, I chose to use tools from the open-source Info-ZIP project. These tools are included with most Linux distributions are available for various other platforms, are fast and small, and use an established file standard for Windows systems. Furthermore, the compression abilities of the Info-ZIP tools allow one to significantly reduce the size of the file archives on the Linux backup system.
Network shares (a hard drive or any directory with all its subdirectories) must be set up on the Windows computers to be backed up. If file sharing is not already enabled, you can set it up from the Windows network control panel. Then, in the Windows Explorer, right click on the drive or folder you want to access from the network and choose the Sharing option from the pop-up menu. I recommend allowing read-only access so that crackers cannot alter or destroy your data if they somehow obtain your passwords. Make sure to record the names of these shares. It is a good idea to place the netbios names, DNS names and IP numbers of the Windows computers in your /etc/hosts file of the Linux machine (as directed by the comments in /etc/hosts), especially if your computers lie across different subnets.
Once this is done, you must prepare your Linux system to access and store the data. First create a mount point for the Windows shares by typing mkdir /mnt/smb. After that, you must decide where you will put the archived backups.
I put the backup files on a separate 1GB vfat (Windows) partition that remains unmounted at all times except when the actual backup processes are running. This way, the files are protected as much as possible from file system damage due to power outages, and the hard drive can be temporarily removed from the Linux computer and put into a Windows computer to facilitate recovery. In order to accommodate this, I created a mount point called /mnt/backups.
A script is a text file containing commands that one would normally type at the Linux command prompt. You can use them to easily accomplish very complex tasks repeatedly. Making a script is as simple as typing the text into your favorite editor, saving it and then using the chmod u+x command on the file.
Listing 1 shows the script that backs up the DATA directory from the d_drive share on the computer named “higgins”. This script runs on my Linux computer, “magnum”, and is stored as the file root/backup/higgins.
Listing 1. DATA Directory Backup
The first line, while looking like a comment, actually instructs the computer to use bash to execute the script. Next comes all the shell variables that the main part of the script will use to back up the data on higgins. This practice of putting the case-specific values in variables at the beginning of the script allows the user to make new versions for new computers very quickly by copying the basic script and changing a few easily seen values. Listing 2 shows a different set of variables for a Windows 98 machine (“rick” with a shared C: drive) and a Windows NT machine (“tc” with a shared folder named “data”). Note how the Windows NT variables need to specify a user name and the password associated with that username.
Listing 2. Variables for the Windows Machines
The remaining lines actually do the work. The command export PASSWD puts the password in an environment variable that the smbmount program reads automatically. The smbumount command is executed next in case someone forgot to unmount an SMB share from the mount point. (If there is nothing there, smbumount returns a harmless error message and the script continues.) The smbmount program then attempts to mount the remote share. -N switch instructs it not to ask for a password to replace the value of the PASSWD environment variable. The -n switch communicates the username to smbmount.
An if statement checks to see if the specified backup files actually exist before doing any backup work in case the network may be down or the remote computer is switched off. In this case the script will terminate after making the mount point available again.
If the Linux machine can access the remote files, all archiving is done with the zip command. The -r switch is the standard recursion option, which makes zip go through every subfolder of the data directory. The -u puts zip in update mode, where it will only add or change files that are not already archived or those that have changed. The -v parameter instructs zip to verbosely show the names of every file it checks on the display—a useful option for troubleshooting.
After a backup script has been set up for each computer, you can make a simple script named master to call each of the backup scripts sequentially. An example of my master script is shown in Listing 3.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
8 hours 17 min ago - Dynamic DNS
8 hours 52 min ago - Reply to comment | Linux Journal
9 hours 50 min ago - Reply to comment | Linux Journal
10 hours 40 min ago - Not free anymore
14 hours 42 min ago - Great
18 hours 29 min ago - Reply to comment | Linux Journal
18 hours 37 min ago - Understanding the Linux Kernel
20 hours 52 min ago - General
23 hours 22 min ago - Kernel Problem
1 day 9 hours ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
...
Last time I checked, you needed a Windows CD to create the Windows PE one. So, what if my copy of Windows is the OEM one that came with the computer?
Good Stuff
Hi O'brien. I found your tutorial very useful.
We are setting up a LINUX based Server architecture for backing up and providing services on our network and we have found this the easiest to follow tutorial.
Thanks a million
Nice Tutorial !!!
Very nice tutorial <
comment
I rather use rsync and a large harddrive instead of compressing the data to avoid corrupted archives
I'm building an AMD Athelon 1800+ Backup Server ,using ubuntu ,and I'll use your scripts
after modifications :)
thanx