Quantcast
Username/Email:  Password: 

Reliable, Inexpensive RAID Backup

Here's a way to implement backups that requires minimal human input--besides doing them, of course.

As a topic, backups is one of those
subject likely to elicit as many answers as people you ask about
it. It is as personal a choice as your desktop configuration or
your operating system. So in this article I am not even going to
attempt to cover all the options. Instead I describe the methods I
use for building a reliable, useful backup system. This solution is
not the right answer for everyone, but it works well for my
situation.Everyone knows they should be doing backups. But do you? How
many times have you started a backup schedule only to let it slide
after a few weeks? Sounds a bit like an exercise or diet regime,
doesn't it?I had several goals when designing a new backup system for my
home and colocated web server: reliability of stored data,
automation of the backup process and relative low cost. Human error
is the weakest element of any backup system, so a 100% hands-off
system was my goal.In "Scary Backup Stories",
Paul Barry discusses failed backups. The common thread of his
stories was somewhere in the chain of events a person had forgotten
a very important step. The first story he tells highlights how one
team forgot to format the tapes. They had religiously followed
their backup plan, backing up onto the unformatted tapes, only to
discover the tapes were useless.I did some reading and settled on a RAID-5 array of hard
drives as the most reliable way to store data. It can survive a
single drive failure and recover from it when you replace the
failed drive. Unlike tape, CDR or DVD backups, it doesn't need
someone to swap media or format and rotate tapes. None of the RAID
methods can survive a two-drive failure, so RAID-5 is as good as it
gets.RAID-5 achieves its reliability by writing the data across a
number of disks, along with error detection information. The
information is spread in such a way that no single-disk failure can
destroy the archive. And when you replace the failed drive it
automatically rebuilds the data that was on that section of the
RAID.Hardware ChoicesThe base system would be my recently retired colocated web
server box. It has a nice rackmount case, a 400MHz AMD processor
and 768MB of RAM. I added a beefier power supply (Antec 350W from
Best Buy) to replace the 250W unit that came with the case. The
system already had a SCSI controller and a 5GB SCSI drive that I'd
be using for the root filesystem. Yes, 5GB is small by today's
standards, but this system was built and installed in 1999. It ran
without failure until it was removed in December 2002, because the
ISP went out of business. The minimal install of Red Hat 8 takes
about 400MB, so this drive works just fine for its new
purpose.SCSI usually is the first choice for reliable RAID hardware,
but it is expensive--not only the drives but the controllers, too.
Also important reason is speed: SCSI handles multiple accesses to
the drive more efficiently than IDE drives. But for my application
speed wasn't a deciding factor.IDE RAID controllers are becoming more affordable but are
still in the $200+ price range as of this writing. A less expensive
alternative is to add several IDE controller cards to the system
and put one drive per channel (2 drives per card) on them. These
PCI IDE cards are less than $25 each, and they support the newer
133MHz IDE bus speeds.I chose to install two PCI cards for use as RAID controllers.
This left the IDE controllers on the motherboard free for adding
other drives at a later time. They also could be used to quickly
back up a drive that I didn't want to copy over the network.There are two good reasons for limiting backup to a single
drive per channel. First, if one drive fails it can disrupt the
other drive on the channel, causing a catastrophic two-drive
failure. The other reason is speed. With two drives on an IDE
chain, the throughput is halved, as I understand it, so it makes
sense to use only a single drive. An argument also can be made for
using only one drive per controller card. At that point, though,
you might as well invest in a dedicated RAID card.My drive choice had already been made. For some time, I'd
been using a second Maxtor drive in each of my systems as a backup
drive, mirroring the live filesystem to it with rsync. And I have
been using Maxtor drives for years without a single failure, unlike
Fujitsu drives, which seem to drop dead within a year (I have three
of them in the junk box). I suppose this means that as soon as this
article is published, all of my reliable drives will fail at the
same time.You need to have three drives for a minimum RAID-5 system.
The drives all should be the same size, because the total size is
calculated using the smallest drive size, multiplied by
1-number of drives. So, three 30GB drives
yield a RAID-5 of about 60GB of storage. At the time, I had two
40GB and one 30GB drives on hand. So I wasted about 20GB of space
in building this system in the interest of getting it up and
running as quickly as possible.It may be possible to resize the array by adding more drives
at a later time, but unless you have a second backup of the data,
you probably don't want to try this. Instead I'd recommend buying a
larger drive, copying the RAID to it and rebuilding the RAID
filesystem from scratch.Software ChoicesI'm not going to get into a discussion of the different RAID
levels. Suffice it to say that for my purposes RAID-5 fit the bill.
It provides larger storage space than the single drive and the
capability of surviving and recovering from a single-drive
failure.When dealing with requirements like mine, I really don't see
any need to have hardware RAID. I don't need speed, the backups run
when the LAN is usually idle and the only other load the machine
has is running the SETI@home client in the background.The goal here was to install as plain a Linux system as
possible, so in the event of a failed RAID root filesystem, it
could be reinstalled with a minimum of hassle. I have several
systems running on Red Hat 8.0, so I chose it as my distribution.
The instructions, though, should apply to any modern Linux
distribution that has RAID support enabled in the kernel by
default.I did a minimal install of Red Hat 8.0, selecting individual
packages and turning off everything that didn't look important. RH
may call it a minimal install, but it still includes a number of
things you probably don't need. Check the box that says select all
packages, then go through the list and turn them off. If you turn
off too much, the configuration program will resolve the
dependencies before the final install and prompt you with a list of
packages that need to be added.Use Disk Druid to partition your drives. For the drives that
will be used in the RAID, format them as Software Raid and select a
partition size that covers the full drive. Remember to configure
another drive/partition as the root partition with swap and /boot.
RAID systems can be booted from a root partition that lives on the
RAID, but it is a bit tricky to set up, and I wanted to keep this
as straightforward as possible.To create the RAID system select the RAID button from the
choices in Disk Druid. The partitions you selected as Software Raid
will be selected by default. Enter a mountpoint (I used /backup)
and the RAID level (5 in my case, really the only option that makes
sense to me). Format it with your favorite journaling filesystem. I
used ext3 for my system, but ReiserFS should work equally as well.
I tend to prefer ext3 to ReiserFS mostly because it is
backward-compatible with ext2. This way, if anything happens to the
journal I can still access the data as an ext2 filesystem.Continue with a normal install. You can put as much or as
little on the system as you wants. I selected the minimal install
and had to install the samba-common, samba-clients and cups-libs
packages before smbmount could be used to backup Windows
machines.Reboot your system and confirm RAID is running by entering
df to see what filesystems are mounted and what
their capacities are. Here's my current output:Filesystem1K-blocksUsedAvailableUse%Mounted on/dev/sda13534096544004281056817%//dev/md05911440447497448861404085%/backupnone38674403867440%/dev/shm/dev/md0 is the RAID device, and as you can see I've done a
good job of filling it with backups. Which brings me to the next
step--actually backing up your systems. I use rsync and SSH along
with smbmount for my backups. Set up your systems so the root user
on the backup system can access root on all the systems that need
to be backed up. Set it up so the backup system's root user can log
in without being asked for a password.Do this set up by generating a key pair on the backup machine
with ssh-keygen -t DSA, and then copy the
.ssh/id_dsa.pub file into the .ssh/authorized_keys2 file on all of
the systems to be backed up. This authorizes the backup system to
access all of the target system's files. If you only need to back
up a subset of the files,you could use a user other than root on
the target system.Because this system has access to all of your other systems,
it needs to be as secure as possible. Don't run any other services
on it, and make sure you always use SSH to log into the machine, so
its root password isn't exposed to the rest of the network.I use rsync to handle the copying of only the files that have
changed since the last backup. This program efficiently calculates
the differences and transfers the changes, saving time and
bandwidth. With rsync I am able to do nightly backups of my
colocated web server--after an initial eight-hour backup of the
base system over my 256KB cable modem connection.I modified an rsync backup script by tridge@linuxcare.com to
fit my needs. It creates a lockfile to prevent two instances from
running at the same time, which is a possibility if something hangs
during a backup. It dumps a copy of all the RPMs installed on the
target system into a file in the target's /etc/ directory, using
this command:

ssh root@target.home "rpm -qa > /etc/rpm_qa.txt"

This way you know what RPMs were installed on the
system.The script uses the backup-dir feature of rsync to create
daily directories that contain the files that have changed. This
way you end up with a current, full and complete backup and seven
directories, named after the days of the week, with the files that
changed on that day. This is much easier to restore than a
old-fashioned, full backup and incremental changes.The script could be modified to fit a different backup
schedule by changing the way the directory used by the backup-dir
argument is named. See the associated listing, linux_inc, for the
script to handle backing up Linux machines.For Windows systems (I have only one, my wife's computer) I
mount the Windows shares to the backup system using smbmount, and
then use rsync on the local filesystem to make the backup. See the
associated listing windows_inc for the backup script to handle
Windows machines.All of this is automated with a crontab:

MAILTO=backupadmin@yourdomain.home
# Backup the windows machine at 7pm
0 19 * * *       /backup/scripts/windows_inc
# Backup Linux machine at 2am
0 2 * * *        /backup/scripts/linux_inc

In the scripts provided, do a search for "target" and replace
it with your machine's name or IP address to customize the script
for your setup. Make a separate copy for each machine to backup,
and add it to root's crontab using crontab
-e
.The last feature of the system is automated shutdown when the
power fails. The system uses an Asus P5A motherboard with an ATX
power supply, so it is capable of shutting itself off. I have it
connected to an APC 500 power backup with a USB connection.I installed the latest version of apcupsd to handle shutting
down the system when the power has been out for two minutes. The
ext3 filesystem and the RAID should be able to prevent any data
corruption without a UPS attached, but why take the chance?My system has been running backups for about a month. Nightly
reports are e-mailed to me (from root's cron job) that detail the
files backed up. The only hitch I ran into was when the Windows
machine was off it would delete the archive--not a good thing! So I
added error checks to the smbmount, and not it does not try to do a
backup if mounting the Windows shares fail.Hopefully this article has convinced you that automated
backups can be done with a minimum of hassle. It is possible to
remove much of the human element from the backup process, but not
completely. You still need to monitor your system to make sure
things are running smoothly.ResourcesScary Backup
Stories
Software
RAID HOWTO
apcupsdBrian
Lane
Associated Listingslinux_inc: Incremental Linux System
Backup
window_ince: Incremental Windows
System Backup
Brian Lane is a software
developer from Seabeck, Washington, where he lives with his wife
and son. When he isn't writing software for
www.shinemicro.com,
he is working on various Linux projects which can be found at
www.brianlane.com.

email: bcl@brianlane.com

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Raid is not a backup solution & the failure of hardware raid

Anonymous's picture

Raid is not a good backup solution as it does not protect against all single point of failure failures. Nobabily, failures in a hardware raid controller card or in power supplies can cause data loss. Not to mention there's only a single, always up to date, copy. Such a backup is no good for rolling back to recover from operator error.

On another note, hardware raid, raid in the controller, has a problem. If your controller ever fails, you'd better have another compatible, working, controller. This means if you want real relyability, you'd better not use any hardware raid controller of which you own only one. Elsewise, you may someday find yourself on e-bay looking for another raid controller that will read your disks.

Re: Raid is not a backup solution & the failure of hardware raid

Anonymous's picture

A RAID drive backup is not a back up, it is merely protection against an isolated hard drive failure. When was the last time you had a hard drive totally fail ? Human error is often the cause of major errors. ie: We make some type of mistake (eg: virus or partitioning) and we end up deleting all our files. A RAID drive won't get YESTERDAY'S or last weeks files back.

Re: Raid is not a backup solution & the failure of hardware raid

Anonymous's picture

Raid 10 can survive 2 disks failing. raid 10 is raid 5 mirrored or raid 5 with raid 1

Re: Reliable, Inexpensive RAID Backup

Anonymous's picture

A server at work had hardware SCSI RAID-5 controller with hot swappable disks. No other backup. One drive died. It was removed and a spare drive was plugged in. End of problem? No, controller crashed. All data lost. It took weeks to recover. Some data was gone forever. A freak occurence, yes, but can you afford to have it happen to you? Defense in depth they say. Offline and offsite backup is still a good idea.

Re: Reliable, Inexpensive RAID Backup

Anonymous's picture

Quite right. I have seen this type of thing twice where it is the RAID Controller which freaks and then it does not matter how many disks you have, your data is gone. Problem was an incomaptible firmware version with the disks and RAID controller (manufacturer was HP) - so if you go this route then update all your firmware before you use their kit!

I much prefer mirrored disks for small setups like the author is describing - an IDE Adaptec RAID1 card is cheap and a couple of large disks can store a lot of data.

Taking the whole lot off site is crucial though.

Re: Reliable, Inexpensive RAID Backup

Anonymous's picture

Whilst a single RAID cannot survive a multiple drive failure, a RAID of RAIDs can. For instance, a RAID5 of RAID5 arrays will survive a 2 disk failure, a 3 level nesting will survive 3 disks (given that you have proper redundancy in controllers), etc.

Similarly, mirrored RAID5 arrays (2 copies) will survive any 2 drive failure, and a triple mirror of RAID5 arrays will survive a 3 disk failure.

RAID6 might also be able to survive multiple disk failures by using two different sets of distributed parity.

Re: Reliable, Inexpensive RAID Backup

brianlane's picture

Ah! Now that sounds interesting. I hadn't considered making RAIDs out of already existing RAIDs. Thanks for the new information.

brian

Re: Reliable, Inexpensive RAID Backup

Anonymous's picture

This method is good if you have no long term need to archive the data, and you don't need disaster recovery from fire. If you need long term archives or off site storage this just won't do.

Good for local storage though.

Re: Reliable, Inexpensive RAID Backup

Anonymous's picture

There is a method whereby RAID can survive a 2 disk (or more) failure.

It is commonly called Raid 1+0, (or in another incarnation 0+1).

In theory you can lose 2 sets of disks (if setup correctly) before you start to loose data.

Re: Reliable, Inexpensive RAID Backup

Anonymous's picture

"Reliable, Inexpensive [and S L O W W W W] RAID Backup"

I've done this before and it's mental how much it can slow your machine down. Just get someone to rsync it off your machine. These guys look expensive, but they're linux guys and cut me a deal since it's just my personal stuff and I'm a opensource developer etc etc

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options