Reliable, Inexpensive RAID Backup
I'm not going to get into a discussion of the different RAID levels. Suffice it to say that for my purposes RAID-5 fit the bill. It provides larger storage space than the single drive and the capability of surviving and recovering from a single-drive failure.
When dealing with requirements like mine, I really don't see any need to have hardware RAID. I don't need speed, the backups run when the LAN is usually idle and the only other load the machine has is running the SETI@home client in the background.
The goal here was to install as plain a Linux system as possible, so in the event of a failed RAID root filesystem, it could be reinstalled with a minimum of hassle. I have several systems running on Red Hat 8.0, so I chose it as my distribution. The instructions, though, should apply to any modern Linux distribution that has RAID support enabled in the kernel by default.
I did a minimal install of Red Hat 8.0, selecting individual packages and turning off everything that didn't look important. RH may call it a minimal install, but it still includes a number of things you probably don't need. Check the box that says select all packages, then go through the list and turn them off. If you turn off too much, the configuration program will resolve the dependencies before the final install and prompt you with a list of packages that need to be added.
Use Disk Druid to partition your drives. For the drives that will be used in the RAID, format them as Software Raid and select a partition size that covers the full drive. Remember to configure another drive/partition as the root partition with swap and /boot. RAID systems can be booted from a root partition that lives on the RAID, but it is a bit tricky to set up, and I wanted to keep this as straightforward as possible.
To create the RAID system select the RAID button from the choices in Disk Druid. The partitions you selected as Software Raid will be selected by default. Enter a mountpoint (I used /backup) and the RAID level (5 in my case, really the only option that makes sense to me). Format it with your favorite journaling filesystem. I used ext3 for my system, but ReiserFS should work equally as well. I tend to prefer ext3 to ReiserFS mostly because it is backward-compatible with ext2. This way, if anything happens to the journal I can still access the data as an ext2 filesystem.
Continue with a normal install. You can put as much or as little on the system as you wants. I selected the minimal install and had to install the samba-common, samba-clients and cups-libs packages before smbmount could be used to backup Windows machines.
Reboot your system and confirm RAID is running by entering df to see what filesystems are mounted and what their capacities are. Here's my current output:
Filesystem | 1K-blocks | Used | Available | Use% | Mounted on |
/dev/sda1 | 3534096 | 544004 | 2810568 | 17% | / |
/dev/md0 | 59114404 | 47497448 | 8614040 | 85% | /backup |
none | 386744 | 0 | 386744 | 0% | /dev/shm |
/dev/md0 is the RAID device, and as you can see I've done a good job of filling it with backups. Which brings me to the next step--actually backing up your systems. I use rsync and SSH along with smbmount for my backups. Set up your systems so the root user on the backup system can access root on all the systems that need to be backed up. Set it up so the backup system's root user can log in without being asked for a password.
Do this set up by generating a key pair on the backup machine with ssh-keygen -t DSA, and then copy the .ssh/id_dsa.pub file into the .ssh/authorized_keys2 file on all of the systems to be backed up. This authorizes the backup system to access all of the target system's files. If you only need to back up a subset of the files,you could use a user other than root on the target system.
Because this system has access to all of your other systems, it needs to be as secure as possible. Don't run any other services on it, and make sure you always use SSH to log into the machine, so its root password isn't exposed to the rest of the network.
I use rsync to handle the copying of only the files that have changed since the last backup. This program efficiently calculates the differences and transfers the changes, saving time and bandwidth. With rsync I am able to do nightly backups of my colocated web server--after an initial eight-hour backup of the base system over my 256KB cable modem connection.
I modified an rsync backup script by tridge@linuxcare.com to fit my needs. It creates a lockfile to prevent two instances from running at the same time, which is a possibility if something hangs during a backup. It dumps a copy of all the RPMs installed on the target system into a file in the target's /etc/ directory, using this command:
ssh root@target.home "rpm -qa > /etc/rpm_qa.txt"
This way you know what RPMs were installed on the system.
The script uses the backup-dir feature of rsync to create daily directories that contain the files that have changed. This way you end up with a current, full and complete backup and seven directories, named after the days of the week, with the files that changed on that day. This is much easier to restore than a old-fashioned, full backup and incremental changes.
The script could be modified to fit a different backup schedule by changing the way the directory used by the backup-dir argument is named. See the associated listing, linux_inc, for the script to handle backing up Linux machines.
For Windows systems (I have only one, my wife's computer) I mount the Windows shares to the backup system using smbmount, and then use rsync on the local filesystem to make the backup. See the associated listing windows_inc for the backup script to handle Windows machines.
All of this is automated with a crontab:
MAILTO=backupadmin@yourdomain.home # Backup the windows machine at 7pm 0 19 * * * /backup/scripts/windows_inc # Backup Linux machine at 2am 0 2 * * * /backup/scripts/linux_inc
In the scripts provided, do a search for "target" and replace it with your machine's name or IP address to customize the script for your setup. Make a separate copy for each machine to backup, and add it to root's crontab using crontab -e.
The last feature of the system is automated shutdown when the power fails. The system uses an Asus P5A motherboard with an ATX power supply, so it is capable of shutting itself off. I have it connected to an APC 500 power backup with a USB connection.
I installed the latest version of apcupsd to handle shutting down the system when the power has been out for two minutes. The ext3 filesystem and the RAID should be able to prevent any data corruption without a UPS attached, but why take the chance?
My system has been running backups for about a month. Nightly reports are e-mailed to me (from root's cron job) that detail the files backed up. The only hitch I ran into was when the Windows machine was off it would delete the archive--not a good thing! So I added error checks to the smbmount, and not it does not try to do a backup if mounting the Windows shares fail.
Hopefully this article has convinced you that automated backups can be done with a minimum of hassle. It is possible to remove much of the human element from the backup process, but not completely. You still need to monitor your system to make sure things are running smoothly.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
- Tech Tip: Really Simple HTTP Server with Python
- BASH script to log IPs on public web server
4 hours 9 min ago - DynDNS
7 hours 44 min ago - Reply to comment | Linux Journal
8 hours 17 min ago - All the articles you talked
10 hours 40 min ago - All the articles you talked
10 hours 43 min ago - All the articles you talked
10 hours 45 min ago - myip
15 hours 10 min ago - Keeping track of IP address
17 hours 59 sec ago - Roll your own dynamic dns
22 hours 14 min ago - Please correct the URL for Salt Stack's web site
1 day 1 hour ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Raid is not a backup solution & the failure of hardware raid
Raid is not a good backup solution as it does not protect against all single point of failure failures. Nobabily, failures in a hardware raid controller card or in power supplies can cause data loss. Not to mention there's only a single, always up to date, copy. Such a backup is no good for rolling back to recover from operator error.
On another note, hardware raid, raid in the controller, has a problem. If your controller ever fails, you'd better have another compatible, working, controller. This means if you want real relyability, you'd better not use any hardware raid controller of which you own only one. Elsewise, you may someday find yourself on e-bay looking for another raid controller that will read your disks.
Re: Raid is not a backup solution & the failure of hardware raid
A RAID drive backup is not a back up, it is merely protection against an isolated hard drive failure. When was the last time you had a hard drive totally fail ? Human error is often the cause of major errors. ie: We make some type of mistake (eg: virus or partitioning) and we end up deleting all our files. A RAID drive won't get YESTERDAY'S or last weeks files back.
Re: Raid is not a backup solution & the failure of hardware raid
Raid 10 can survive 2 disks failing. raid 10 is raid 5 mirrored or raid 5 with raid 1
Re: Reliable, Inexpensive RAID Backup
A server at work had hardware SCSI RAID-5 controller with hot swappable disks. No other backup. One drive died. It was removed and a spare drive was plugged in. End of problem? No, controller crashed. All data lost. It took weeks to recover. Some data was gone forever. A freak occurence, yes, but can you afford to have it happen to you? Defense in depth they say. Offline and offsite backup is still a good idea.
Re: Reliable, Inexpensive RAID Backup
Quite right. I have seen this type of thing twice where it is the RAID Controller which freaks and then it does not matter how many disks you have, your data is gone. Problem was an incomaptible firmware version with the disks and RAID controller (manufacturer was HP) - so if you go this route then update all your firmware before you use their kit!
I much prefer mirrored disks for small setups like the author is describing - an IDE Adaptec RAID1 card is cheap and a couple of large disks can store a lot of data.
Taking the whole lot off site is crucial though.
Re: Reliable, Inexpensive RAID Backup
Whilst a single RAID cannot survive a multiple drive failure, a RAID of RAIDs can. For instance, a RAID5 of RAID5 arrays will survive a 2 disk failure, a 3 level nesting will survive 3 disks (given that you have proper redundancy in controllers), etc.
Similarly, mirrored RAID5 arrays (2 copies) will survive any 2 drive failure, and a triple mirror of RAID5 arrays will survive a 3 disk failure.
RAID6 might also be able to survive multiple disk failures by using two different sets of distributed parity.
Re: Reliable, Inexpensive RAID Backup
Ah! Now that sounds interesting. I hadn't considered making RAIDs out of already existing RAIDs. Thanks for the new information.
brian
Re: Reliable, Inexpensive RAID Backup
This method is good if you have no long term need to archive the data, and you don't need disaster recovery from fire. If you need long term archives or off site storage this just won't do.
Good for local storage though.
Re: Reliable, Inexpensive RAID Backup
There is a method whereby RAID can survive a 2 disk (or more) failure.
It is commonly called Raid 1+0, (or in another incarnation 0+1).
In theory you can lose 2 sets of disks (if setup correctly) before you start to loose data.
Re: Reliable, Inexpensive RAID Backup
"Reliable, Inexpensive [and S L O W W W W] RAID Backup"
I've done this before and it's mental how much it can slow your machine down. Just get someone to rsync it off your machine. These guys look expensive, but they're linux guys and cut me a deal since it's just my personal stuff and I'm a opensource developer etc etc