An Automated Reliable Backup Solution
These days, it is common to fill huge hard drives with movies, music, videos, software, documents and many other forms of data. Manual backups to CD or DVD often are neglected because of the time-consuming manual intervention necessary to overcome media size limitations and data integrity issues. Hence, most of this data is not backed up on a regular basis. I work as a security professional, specifically in the area of software development. In my spare time, I am an open-source enthusiast and have developed a number of open-source projects. Given my broad spectrum of interests, I have a network in my home consisting of 12 computers, which run a combination of Linux, Mac OS X and Windows. Losing my work is unacceptable!
In order to function in my environment, a backup solution must accommodate multiple users of different machines, running different operating systems. All users must have the ability to back up and recover data in a flexible and unattended manner. This requires that data can be recovered at a granularity ranging from a single file to an entire archive stored at any specified date and time. Because multiple users can access the backup system, it is important to incorporate security functions, specifically data confidentiality, which prevents users from being able to see other users' data, and data integrity, which ensures that the data users recover from backups was originally created by them and was not altered.
In addition to security, reliability is another key requirement. The solution must be tolerant of individual hardware faults. In this case, the component most likely to fail is a hard drive, and therefore the solution should implement hard drive fault tolerance. Finally, the solution should use drive space and network bandwidth efficiently. Efficient use of bandwidth allows more users to back up their data simultaneously. Likewise, if hard drive space is used efficiently by each user, more data can be backed up. A few additional requirements that I impose on all of my projects are that they be visually attractive, of an appropriate size and reasonably priced.
I first attempted to find an existing solution. I found a number of solutions that fit into two categories: single-drive network backup appliances and RAID array network backup appliances. A prime example of a solution in the first category is the Western Digital NetCenter product. All of the products I found in this category failed in most, if not all, of the functionality, security, reliability and performance requirements. The appliances found in the second category are generally designed for enterprise use rather than personal use. Hence, they tend to be much more expensive than those found in the first category. The Snap Server 2200 is an example of one of the lower-end versions of an appliance that fits under the second category. It generally sells for about $1,000 US with a decent amount of hard drive space. The products I found in category two also failed in most, if not all, of the functionality, security, performance and general requirements.
Due to the excessive cost and requirements issues of the readily available solutions, I decided to build my own unattended, encrypted, redundant, network-based backup solution using Linux, Duplicity and commercial off-the-shelf (COTS) hardware. Using these tools allowed me to create a network appliance that could make full and incremental backups, which are both encrypted and digitally signed. Incremental backups are backups in which only the changes since the last backup are saved. This reduces both the required storage and the required bandwidth for each backup. Full backups are backups in which the complete files, rather than just the changes, are backed up. These tools also provided the capability of restoring both entire archives and single files backed up at a specified time. For, example, suppose I recently received a virus, and I know that a week ago I did not have the virus. This solution would easily allow me to restore my system as it was one week ago, or two months ago, or as far back as my first backup.
Duplicity, according to its project Web page, is a backup utility that backs up directories by encrypting tar-format volumes and uploading them to a remote or local file server. Duplicity, the cornerstone of this solution, is integrated with librsync, GnuPG and a number of file transport mechanisms. Duplicity provides a mechanism that meets my functionality, security and performance requirements.
Duplicity first uses librsync to create a tar-format volume consisting of either a full backup or an incremental backup. Then it uses GnuPG to encrypt and digitally sign the tar-format volume, providing the data confidentiality and integrity required. Once the tar-format volume is encrypted and signed, Duplicity transfers the backups to the specified location using one of its many supported file transportation mechanisms. In this case, I used the SSH file transportation mechanism, because it assures that the backups are encrypted while in transit. This is not necessary, as the backups are encrypted and signed prior to being transported, but it does add another layer of protection and complexity for someone trying to break in to the system. Furthermore, SSH is a commonly used service that eliminates the need to install another service, such as FTP, NFS or rsync.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- RSS Feeds
- Readers' Choice Awards
- Tech Tip: Really Simple HTTP Server with Python
- DynDNS
2 hours 21 min ago - Reply to comment | Linux Journal
2 hours 54 min ago - All the articles you talked
5 hours 17 min ago - All the articles you talked
5 hours 20 min ago - All the articles you talked
5 hours 22 min ago - myip
9 hours 47 min ago - Keeping track of IP address
11 hours 38 min ago - Roll your own dynamic dns
16 hours 51 min ago - Please correct the URL for Salt Stack's web site
20 hours 2 min ago - Android is Linux -- why no better inter-operation
22 hours 18 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?






Comments
Duplicity for Windows
Those with Windows clients should check out the wonderful duplicati implementation
Actually, you will also have
Actually, you will also have the added complication of file system issues if backing up the forked HFS+ file system on the Mac to the single fork file system on the Linux box.
Dave from nanovor game
I was not able to covery every aspect.
I was not able to covery every aspect. Getting it working on Mac OS X is pretty close to what is required for getting it working on Linux. However, Windows is a completely different experience, it required a huge amount of work on my part and I have not had a chance to write it all up yet in final form. Work has been consuming most of my time as of late, but I am still trying to get something out to help people like yourself.
Kevin Horn - club penguin
awesome
This article is fantastic. Great work. Just what I needed to jumpstart my move to this solution without having to learn too much before I get it working.
Thanks again.
-N
Any updates on sourcing of components?
Andrew:
Are there any updates on sourcing of components and their features?
I started by looking at
I started by looking at small form-factor motherboards that I might use. I had used Mini-ITX motherboards in a number of other projects and knew that there was close to full Linux support for it. Given that this project did not require a fast CPU, I decided on the EPIA Mini-ITX ML8000A motherboard, which has an 800MHz CPU, a 100Mb network interface and one 32-bit PCI slot built in to it.
Unclear
I am having difficulty understanding what you are specifically referring to. If you are referreing to the hardware and the functionality of it, not much has change since the article was released. If not, please drop me an e-mail at cyphactor@socal.rr.com with further questions.
Is something missing....?
When I read this article I was lead to believe that since the author has "12 computers, which run a combinations of Linux, Mac OS X, and Windows. Losing my work is unacceptable!" we were going to a see a solution that provided for backup of all the OSs he listed. Unfortunately it appears, only Linux like OSs are supported. Foiled again!
Patrick
Try BackupPC
You may want to check out BackupPC here. I've done a write-up here about integrating Windows Active Directory clients with the BackupPC server.
Limitations of Reality
You are correct, when you did read the article it did lead you to beleive I have 12 computers running a variety of operating systems Linux, Mac OS X, and Windows. The limitations of reality are that there is a word limit for articles. Hence I was not able to covery every aspect. Getting it working on Mac OS X is pretty close to what is required for getting it working on Linux. However, Windows is a completely different experience, it required a huge amount of work on my part and I have not had a chance to write it all up yet in final form (if I can remember all that I did). Work has been consuming most of my time as of late, but I am still trying to get something out to help people like yourself. My ultimate goal is to expand this current solution into a more complete feature filled solution that is pretty trivial to setup. Sadly it isn't there yet, but it is on the back burner. If you have any questions feel free to e-mail me at cyphactor@socal.rr.com.
Actually, you will also have
Actually, you will also have the added complication of file system issues if backing up the forked HFS+ file system on the Mac to the single fork file system on the Linux box.
Backup for Windows
Maybe a solution for your Windows machine is a free software called Cobian Backup (http://www.educ.umu.se/~cobian/cobianbackup.htm). It works very well.
Best regards.
Tabare
Rsync backup for Windows to a Linux server
Not that Rsync is the best solution out there(I do really like the duplicity backup solution outlined above)there is a way to use Cygwin and Rsync to a Linux server.
Check it out here http://www.gaztronics.net/rsync.php I have not tried it, but I may if I cannot get Duplicity to play well with Cygwin
Try using this page--Running Duplicity in Cygwin
I haven't set this up yet, but tomorrow's the day. I will try to post to let you know how it goes. See this site for instructions on running duplicity in Cygwin. I don't see why it wouldn't work.... http://katastrophos.net/andre/blog/2006/04/03/duplicity-042-on-cygwin/