High Availability Linux with Software RAID

Turn your machine into an HA server after you test it out on a VMware setup.
Bootable CD Creation

Creating a bootable CD recovery disk can be done easily with the mkbootdisk utility. In order to include the /boot partition on the recovery CD, however, a small patch needs to be applied to mkbootdisk (Listing 1). Also, you must have the mkisofs package installed. The following commands, issued as root, take of this:

cd /sbin
cp mkbootdisk mkbootdisk.orig
patch -p0 \ mkbootdisk.patch

Listing 1. mkbootdisk.patch

After the patch is applied, the following command creates the bootable recovery CD:

cd /tmp
mkbootdisk --device bootcd.iso --iso 2.4.18-14

When using the --iso option, the specified --device is expected to be a filename to which an ISO image will be written. The last parameter, 2.4.18-14, specifies which kernel to use.

We can check the ISO image by using the following commands:

cd /tmp
losetup /dev/loop1 bootcd.iso
mount /dev/loop1 /mnt

They create a loopback device on which the ISO image is then mounted. Upon inspection, you should see the complete /boot directory on the CD image.

For a physical machine, this image would be burned onto a CD. For the purposes of testing, VMware can use an ISO image directly as a virtual CD-ROM drive.

Failure Testing

Now for the fun part. One of the advantages of using VMware for testing is the ability to fail hardware without having to worry about possible repercussions to physical hardware. In order to ensure that the system behaves as expected, I ran two failure tests: failing a pure RAID drive and failing a mixed native and RAID drive.

To fail a drive under VMware, I simply shut down the VM, move the files representing a particular virtual drive to a backup folder and re-create a fresh virtual drive. This process effectively creates a fresh unpartitioned drive--exactly what the situation would be if a drive had failed and been replaced.

For the first test, I "failed" the fourth drive in the array. After a successful boot in the VM, I looked at /proc/mdstat:

Personalities : [raid5]
read_ahead 1024 sectors
md0 :   active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
        1027584 blocks level 5, 64k chunk, algorithm 0 [5/4] [UU_UU]
md1:    active raid5 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
        44780800 blocks level5, 64k chunk, algorithm 0 [6/5] [UUU_UU]

It is a little counterintuitive, but the status is indicated starting with the lower drive numbers from left to right. So, for md0, [UU_UU] indicates that drives 0 and 1 are up, drive 2 is down and drives 3 and 4 are up. These correlate to sdb1, sdc1, sdd1, sde1 and sdf1, respectively. For md1, [UUU_UU] indicates that drives 0 through 2 are up, drive 3 is down and drives 4 and 5 are up. These correlate to sda2, sdb2, sdc2, sdd2, sde2 and sdf2, respectively.

As we would expect, the sdd drive has failed. At this point the RAID is running in degraded mode. If another drive were to fail, there would be data loss.

We can reintegrate the "new" drive into the array while the system is running. To do this, we need to partition the drive and use the raidhotadd utility. The drive should be partitioned exactly as it was originally. For this drive, both partitions are of type Linux raid autodetect (fd). After the drive is repartitioned, execute the following commands:

raidhotadd /dev/md0 /dev/sdd1
raidhotadd /dev/md1 /dev/sdd2
cat /proc/mdstat

After which, you should see something like the following output:

Personalities : [raid5]
read_ahead 1024 sectors
md0 :   active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
        1027584 blocks level 5, 64k chunk, algorithm 0 [5/4] [UU_UU]
        [===>.................]  recovery = 18.3% (47816/256896)
finish=0.5min speed=6830K/sec
md1:    active raid5 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
        44780800 blocks level5, 64k chunk, algorithm 0 [6/5] [UUU_UU]

When the sync process is finished for md0, a similar process begins for md1. When completed, you should see that /proc/mdstat appears as it did earlier (with all the Us present) and that the array is no longer in degraded mode.

For the second test, I "failed" the first drive in the array. For this test, we must have the bootable CD-ROM created earlier. It either can be burned onto a CD or the file can be referenced in the VMware configuration (Figure 4).

Figure 4. Prepping VMware for the Bootable CD-ROM

When you boot off the CD, the welcome screen created by the mkbootdisk script appears (Figure 5). The boot fails part way through when the system attempts to mount the /boot partition. This is because the drive /dev/sda1 is not available. Enter the root password to get to maintenance mode, and then edit the filesystem table file using the command vi /etc/fstab. For now, simply comment out the line that contains the /boot entry. On my installation, the fstab file had a label reference for the /boot entry. I prefer to reference the drive directly, so I changed this entry to /dev/sda1 and then commented it out. Type exit and the system reboots, again booting off the CD. This time, it is able to start up completely.

Figure 5. Boot Disk Welcome Screen

You should notice that the md1 RAID volume is running in degraded mode by inspecting /proc/mdstat, as before. The tasks to restore the failed first drive are as follows:

  1. Partition the drive.

  2. Use the raidhotadd utility to rebuild the md1 RAID.

  3. Format the native partition on the drive.

  4. Copy the /boot files from the CD to the drive.

  5. Uncomment the /etc/fstab file.

  6. Install the GRUB boot loader in the MBR (master boot record) of the drive.

The drive should be partitioned exactly as it was originally. That is, the first 250MB partition should be type Linux (83), and the second 8750MB partition should be type Linux raid autodetect (fd). You can then enter the command:

raidhotadd /dev/md1 /dev/sda2

to rebuild the md1 RAID. Inspect /proc/mdstat as before to check on the status of the synchronization process.

The native partition should be formatted with the command mke2fs /dev/sda1. Assuming that the CD-ROM drive is mounted on /mnt/cdrom, the following commands restore the /boot partition:

mount /dev/sda1 /boot
cp -p -r /mnt/cdrom/boot/* /boot

Next, edit the /etc/fstab, and uncomment the line containing the /boot partition. Finally, use GRUB to install the boot loader on the drive's MBR. A thorough discussion of GRUB is outside the scope of this article, but the following commands use the original GRUB configuration defined when Red Hat 8.0 was installed:

root (hd0,0)
setup (hd0)

Once the md1 RAID is rebuilt, the system is ready to be rebooted without the recovery CD. Make sure the recovery CD is removed from the CD-ROM drive or that the image reference in the VMware configuration is removed, and reboot. The system should come up normally. A look at /proc/mdstat should show both RAID volumes, with all members up and running.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

backup of /boot partition and MBR on second disk rather than CD?

Anonymous's picture

Is there any reason that the following wouldn't work as ab
alternative to using a boot CD to back up the /boot partition and master boot record in case of a failure on the first disk:

  • Maintain a copy of the boot partition on the second disk and a copy of the boot manager MBR in the MBR area of the second disk (presumably configured to use the boot partition on the second disk).
  • In case of a failure of the first disk, use the BIOS to switch to booting from the second disk by toggling "bootable" bits on the partitions?

    (Can BIOSes typically boot from a second disk?)

edit which /etc/fstab?

Anonymous's picture

In your second test and its recovery steps, you say to edit
/etc/fstab to comment out the /boot entry.

Does the boot fail after the RAID drivers/modules are
loaded, so that the volume containing /etc/fstab is available?

Re: High Availability Linux with Software RAID

Anonymous's picture

using soft raid for swap is waist of CPU,

linux can do the same without soft raid:

just append to all swap partitions "priority=1"

and linux will use them as they were a part of striped soft raid.

Re: High Availability Linux with Software RAID

Anonymous's picture

In case of _real_ drive fail, Linux can (and, imho, will in 99.99%) panic.


In our case drive didn't responded, stupid scsi driver tried to reset scsi adapter, then kernel died...

Certanly, this is far better that lost of _full_ filesystem, but..

Hardware raid is _only_ choise for servers ...

Re: High Availability Linux with Software RAID

Anonymous's picture

I've used RAID and forced failures in dozens of ways and NEVER had a kernel panic. This is with adaptec and Sumbios controllers, and with basically unplugging the drive from a hot shoe while the server was running and serving requests (test environment, as well as actual failures in the real environment)

I did however, know a coworker using a HW RAID controller who had it mark two disks bad because the cable to them had slipped off while the server was being moved. Guess who had to rebuild and restore his whole RAID array because his $1000 RAID card wouldn't let him restart the RAID5 in place due to two bad drives.

P.s. the CPU load on my dual PIII 750 running flat out accessing it's raid arrays is about 1% of a CPU. If you have to worry about 1% of your CPU you have a lot of things on your plate ahead of that.

Re: High Availability Linux with Software RAID

Anonymous's picture

I don't think thats what the author was intending to achieve max performance. But more guarenteed availablity.

If you use the partitions directly in the fstab with priority=1 and a drive fails then the mache will probally go down since a portion of the swap space is now corrupt. However if they are on a RAID 5 setup the machine will just keep on humming. Assuming you don't have a 2nd drive failure.

Re: High Availability Linux with Software RAID

Anonymous's picture

Yes, You could do that, but then You loose HA, because swap will fail,

as soon a disk with a swap partition fails.

Performance wise it would be better to use raid 1 than raid 5 for swap.

Re: High Availability Linux with Software RAID

Anonymous's picture

Anybody has info of how to do this using User Mode Linux?

Re: High Availability Linux with Software RAID

Anonymous's picture

UML is part of the kernel, so is not affected by the RAID subsystem underneath of it. You just need to set up the RAID Disk system as explained, and then install a UML kernel, and way you go.

Re: High Availability Linux with Software RAID

Anonymous's picture

thats not totally true ...
a bug in the ubd driver in uml prevents raidhotadd from working correctly. the bug is known, and a patch is available to fix it (it will be in the next uml release)


Re: High Availability Linux with Software RAID

Anonymous's picture

If one is looking to truely run a HA server, would it not be better to make /boot a RAID-1 array, and use a Ramdisk to boot the machine and allow access to the Software RAID. Also, for better performance of the swap partition, rather than creating a software RAID disk for swap, set all the relevant partitions to swap space and set them to equal priority in /etc/fstab so that they are used as a RAID-0 array, without the overhead of the Software RAID system running.

Re: High Availability Linux with Software RAID

Anonymous's picture

Having swap on RAID is a good idea, otherwise a single disk error

can make your machine crash.

I would tend to disagree with

Anonymous's picture

I would tend to disagree with the whole concept of placing your swap on a raid partition.

See line #18 in the link below for more information:


We're not talking about strip

Anonymous's picture

We're not talking about striping though, but mirroring, so if one drive dies, all the data written to swap doesn't go down with it, as that would be double plus ungood.

One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix