Reliable, Inexpensive RAID Backup
As a topic, backups is one of those subject likely to elicit as many answers as people you ask about it. It is as personal a choice as your desktop configuration or your operating system. So in this article I am not even going to attempt to cover all the options. Instead I describe the methods I use for building a reliable, useful backup system. This solution is not the right answer for everyone, but it works well for my situation.
Everyone knows they should be doing backups. But do you? How many times have you started a backup schedule only to let it slide after a few weeks? Sounds a bit like an exercise or diet regime, doesn't it?
I had several goals when designing a new backup system for my home and colocated web server: reliability of stored data, automation of the backup process and relative low cost. Human error is the weakest element of any backup system, so a 100% hands-off system was my goal.
In "Scary Backup Stories", Paul Barry discusses failed backups. The common thread of his stories was somewhere in the chain of events a person had forgotten a very important step. The first story he tells highlights how one team forgot to format the tapes. They had religiously followed their backup plan, backing up onto the unformatted tapes, only to discover the tapes were useless.
I did some reading and settled on a RAID-5 array of hard drives as the most reliable way to store data. It can survive a single drive failure and recover from it when you replace the failed drive. Unlike tape, CDR or DVD backups, it doesn't need someone to swap media or format and rotate tapes. None of the RAID methods can survive a two-drive failure, so RAID-5 is as good as it gets.
RAID-5 achieves its reliability by writing the data across a number of disks, along with error detection information. The information is spread in such a way that no single-disk failure can destroy the archive. And when you replace the failed drive it automatically rebuilds the data that was on that section of the RAID.
The base system would be my recently retired colocated web server box. It has a nice rackmount case, a 400MHz AMD processor and 768MB of RAM. I added a beefier power supply (Antec 350W from Best Buy) to replace the 250W unit that came with the case. The system already had a SCSI controller and a 5GB SCSI drive that I'd be using for the root filesystem. Yes, 5GB is small by today's standards, but this system was built and installed in 1999. It ran without failure until it was removed in December 2002, because the ISP went out of business. The minimal install of Red Hat 8 takes about 400MB, so this drive works just fine for its new purpose.
SCSI usually is the first choice for reliable RAID hardware, but it is expensive--not only the drives but the controllers, too. Also important reason is speed: SCSI handles multiple accesses to the drive more efficiently than IDE drives. But for my application speed wasn't a deciding factor.
IDE RAID controllers are becoming more affordable but are still in the $200+ price range as of this writing. A less expensive alternative is to add several IDE controller cards to the system and put one drive per channel (2 drives per card) on them. These PCI IDE cards are less than $25 each, and they support the newer 133MHz IDE bus speeds.
I chose to install two PCI cards for use as RAID controllers. This left the IDE controllers on the motherboard free for adding other drives at a later time. They also could be used to quickly back up a drive that I didn't want to copy over the network.
There are two good reasons for limiting backup to a single drive per channel. First, if one drive fails it can disrupt the other drive on the channel, causing a catastrophic two-drive failure. The other reason is speed. With two drives on an IDE chain, the throughput is halved, as I understand it, so it makes sense to use only a single drive. An argument also can be made for using only one drive per controller card. At that point, though, you might as well invest in a dedicated RAID card.
My drive choice had already been made. For some time, I'd been using a second Maxtor drive in each of my systems as a backup drive, mirroring the live filesystem to it with rsync. And I have been using Maxtor drives for years without a single failure, unlike Fujitsu drives, which seem to drop dead within a year (I have three of them in the junk box). I suppose this means that as soon as this article is published, all of my reliable drives will fail at the same time.
You need to have three drives for a minimum RAID-5 system. The drives all should be the same size, because the total size is calculated using the smallest drive size, multiplied by 1-number of drives. So, three 30GB drives yield a RAID-5 of about 60GB of storage. At the time, I had two 40GB and one 30GB drives on hand. So I wasted about 20GB of space in building this system in the interest of getting it up and running as quickly as possible.
It may be possible to resize the array by adding more drives at a later time, but unless you have a second backup of the data, you probably don't want to try this. Instead I'd recommend buying a larger drive, copying the RAID to it and rebuilding the RAID filesystem from scratch.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Linux Systems Administrator
- Using Salt Stack and Vagrant for Drupal Development
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Reply to comment | Linux Journal
1 hour 1 min ago - Dynamic DNS
1 hour 35 min ago - Reply to comment | Linux Journal
2 hours 34 min ago - Reply to comment | Linux Journal
3 hours 24 min ago - Not free anymore
7 hours 26 min ago - Great
11 hours 13 min ago - Reply to comment | Linux Journal
11 hours 21 min ago - Understanding the Linux Kernel
13 hours 36 min ago - General
16 hours 6 min ago - Kernel Problem
1 day 2 hours ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Raid is not a backup solution & the failure of hardware raid
Raid is not a good backup solution as it does not protect against all single point of failure failures. Nobabily, failures in a hardware raid controller card or in power supplies can cause data loss. Not to mention there's only a single, always up to date, copy. Such a backup is no good for rolling back to recover from operator error.
On another note, hardware raid, raid in the controller, has a problem. If your controller ever fails, you'd better have another compatible, working, controller. This means if you want real relyability, you'd better not use any hardware raid controller of which you own only one. Elsewise, you may someday find yourself on e-bay looking for another raid controller that will read your disks.
Re: Raid is not a backup solution & the failure of hardware raid
A RAID drive backup is not a back up, it is merely protection against an isolated hard drive failure. When was the last time you had a hard drive totally fail ? Human error is often the cause of major errors. ie: We make some type of mistake (eg: virus or partitioning) and we end up deleting all our files. A RAID drive won't get YESTERDAY'S or last weeks files back.
Re: Raid is not a backup solution & the failure of hardware raid
Raid 10 can survive 2 disks failing. raid 10 is raid 5 mirrored or raid 5 with raid 1
Re: Reliable, Inexpensive RAID Backup
A server at work had hardware SCSI RAID-5 controller with hot swappable disks. No other backup. One drive died. It was removed and a spare drive was plugged in. End of problem? No, controller crashed. All data lost. It took weeks to recover. Some data was gone forever. A freak occurence, yes, but can you afford to have it happen to you? Defense in depth they say. Offline and offsite backup is still a good idea.
Re: Reliable, Inexpensive RAID Backup
Quite right. I have seen this type of thing twice where it is the RAID Controller which freaks and then it does not matter how many disks you have, your data is gone. Problem was an incomaptible firmware version with the disks and RAID controller (manufacturer was HP) - so if you go this route then update all your firmware before you use their kit!
I much prefer mirrored disks for small setups like the author is describing - an IDE Adaptec RAID1 card is cheap and a couple of large disks can store a lot of data.
Taking the whole lot off site is crucial though.
Re: Reliable, Inexpensive RAID Backup
Whilst a single RAID cannot survive a multiple drive failure, a RAID of RAIDs can. For instance, a RAID5 of RAID5 arrays will survive a 2 disk failure, a 3 level nesting will survive 3 disks (given that you have proper redundancy in controllers), etc.
Similarly, mirrored RAID5 arrays (2 copies) will survive any 2 drive failure, and a triple mirror of RAID5 arrays will survive a 3 disk failure.
RAID6 might also be able to survive multiple disk failures by using two different sets of distributed parity.
Re: Reliable, Inexpensive RAID Backup
Ah! Now that sounds interesting. I hadn't considered making RAIDs out of already existing RAIDs. Thanks for the new information.
brian
Re: Reliable, Inexpensive RAID Backup
This method is good if you have no long term need to archive the data, and you don't need disaster recovery from fire. If you need long term archives or off site storage this just won't do.
Good for local storage though.
Re: Reliable, Inexpensive RAID Backup
There is a method whereby RAID can survive a 2 disk (or more) failure.
It is commonly called Raid 1+0, (or in another incarnation 0+1).
In theory you can lose 2 sets of disks (if setup correctly) before you start to loose data.
Re: Reliable, Inexpensive RAID Backup
"Reliable, Inexpensive [and S L O W W W W] RAID Backup"
I've done this before and it's mental how much it can slow your machine down. Just get someone to rsync it off your machine. These guys look expensive, but they're linux guys and cut me a deal since it's just my personal stuff and I'm a opensource developer etc etc