RAID-1, Part 2
If you are using RAID-1 to help to ensure that your system stays up in the event of a hard disk partition failure, you should consider raiding your swap partition(s). If the disk or partition you are using for swap goes bad, your machine may crash. Using a RAID-1 device for a swap partition can help prevent that crash. If one of the mirrored swap partitions goes bad, the kernel automatically will fail over to the other, and your system should keep running until you can fix the disk problem. The steps that can be used to set up swap on a RAID-1 device are outlined below:
Partition the second disk. See Part 1 of this article for details about this step.
Create /etc/raidtab with an entry for the swap partition. The raidtab file is used by mkraid to configure the RAID device and write the RAID superblock. Once the RAID device is configured, the RAID superblock is used to detect the device. The raidtab file is not used when an existing RAID device is activated. The raidtab entry that was used in this example is shown in Listing 1.
Turn off swap so that the swap RAID array can be created. If you machine is lightly loaded you may be able to turn off swap without causing problems. However, turning off swap could cause a machine to crash. Don't turn off swap unless you can recover from a crash. To be safe, you can go to single-user mode and stop all the user processes on the machine. You can turn off swap, on a Linux machine, using the command swapoff -a; the command swapoff /dev/swappartition also may work. Typing swapon -s will show you the name of the swap partition, before you turn it off, and it will indicate that you have turned off swap after you run swapoff. A safer way to turn off swap is to disable the swap device in /etc/fstab and reboot the machine with no swap enabled. That way, there's no possibility of causing a crash, because swapoff does not have to be invoked.
Use fdisk to toggle the filesystem type of the swap partition on the first disk. You should have already set the swap partition on the second disk to fd (Linux RAID autodetect). If you compile RAID-1 support into the kernel and have the swap partition filesystem type set to fd, your machine can mount the swap RAID-1 array during boot. Otherwise, you'll have to use init scripts to mount the swap RAID-1 array after the disk that contains the md module is mounted. Some will prefer to set up the RAID-1 arrays this way. We don't cover that approach in this article.
Make the new RAID-1 swap array with mkraid /dev/md2. After you run that, type cat /proc/mdstat; /proc/mdstat should indicate that the RAID-1 personality exists (Personalities : [raid1]) and that /dev/md2 is active (md2 : active raid1 hdc4 hda4). If it doesn't, you're into troubleshooting mode. For troubleshooting help use the references provided below.
Make the new RAID array a swap partition. Use the command mkswap /dev/md2, but do not mkswap on the RAID-1 component partitions, in this case /dev/hda4 and /dev/hdc4.
Turn swap on using with swapon /dev/md2. swapon -s should show that the /dev/md2 device is being used for swap.
To use the RAID-1 array /dev/md2 as a swap partition on boot, edit the /etc/fstab file. The line should read /dev/md2 swap swap defaults 0 0.
At this point you can reboot the system to test the swap RAID-1 array configuration. If you are in single-user mode you can use init 3 to bring the user processes back up. When you reboot you should see something like the following in the boot.log and dmesg:
'md: raid1 personality registered as nr 3', 'md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27', 'md: Autodetecting RAID arrays.', 'md: considering hdc4 ...', 'md: adding hdc4 ...', 'md: adding hda4 ...', 'md: created md2', 'raid1: raid set md2 active with 2 out of 2 mirrors', '......', 'Adding Swap: 795128k swap-space (priority -1)'.If RAID support is not compiled into the kernel, you may see an initial failure when starting swap as the system boots. As the OS transitions to multi-user mode, swap will become available.... Just something we noticed before we had it compiled in.
raid-level 1 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 device /dev/hda4 raid-disk 0 device /dev/hdc4 raid-disk 1
You could leave your machine set up to boot from an ext2 partition, not from a RAID array. This is more straightforward than booting from a RAID-1 array. No changes have to be made to use the same boot process that you've been using. The root partition data do not change often. To ensure that the root partition data will be available if one of the disks fails, you could use a daily cron job to rsync, or by some other method, sync the first disk's root partition's contents to the root partition on the second disk.
Booting from a RAID-1 device is easy if you use a modern version of LILO. For the simplest setup, list your RAID device as the boot device in the lilo.conf file, boot=/dev/mdX. LILO notices the RAID setup and will write the boot code to the correct device.
A problem with this simple setup is that LILO only writes its boot code to one disk (the one it thinks is currently being used to boot). If this disk dies and you try to boot from the remaining good disk, it won't work (no boot code). The raid-extra-boot option of LILO version 22.0 or greater tells LILO to write emergency boot code to other partitions.
If your boot RAID device is made up of partitions on /dev/hda and /dev/hdc, adding this line
will make LILO write normal boot code to /dev/hda and emergency boot code to /dev/hdc. Both disks should now be bootable.
Booting raid devices is somewhat hairy. We strongly recommend having a tested boot floppy around just in case. Simply copy your kernel to a (known good) floppy and use the rdev program to change its root device to be your RAID root device (e.g., /dev/md0). The RAID autodetect code will take care of the rest.
Replacing a failed disk is easy if it isn't the boot device. If it is the boot device, you have to figure out some other way to boot the machine. It is easiest to use a floppy (make sure you have RAID support on it, and set its root device to be your correct root partition or RAID device). Alternatively, new versions of LILO claim to write supplemental boot records to all your RAID devices allowing any of them to boot.
Replace the failed hard disk. You don't have to do anything special with the RAID setup. Shut the machine down normally, pull out the old disk and stick a new one in its place. You also can move your remaining good disk around if you want to try to boot from it. The RAID code doesn't care where the hard disks actually are. The name of the RAID device (md0, md1, mdX) is stored in the RAID superblock, so the correct partition always will show up in the right place, no matter where the physical disks may be.
Boot the machine. It should come up with degraded RAID devices, but otherwise run fine. The new disk will be ignored by the RAID code as it doesn't have any RAID signatures. This is the only reboot required.
Partition the new disk. Try to match the new partition sizes and locations (for your sanity) to those on the remaining good disk. As your replacement disk quite likely will be bigger than the old one, make sure you leave a free partition entry or two so you can make use of the extra space in the future.
Use the raidhotadd command to insert the new partitions into the running RAID array. For instance: raidhotadd /dev/md0 /dev/hdb2.
Watch the mirror being rebuilt by looking in /proc/mdstat.
A RAID-1 array normally is made up of two devices. The two devices contain exactly the same data. This works well with IDE disks as the vast majority of motherboards have two IDE controllers.
With only two devices, you don't have any redundancy when one fails. A way around this is to place a third disk in the machine and list it with the spare-disk directive in /etc/raidtab. When one of your active disks fail, the RAID code automatically will rebuild the mirror on the spare disk and use it in the RAID device in place of the failed disk.
Although this seems like a good idea, there are some drawbacks. As mentioned previously, most motherboards only have two IDE controllers. A failed disk easily can take its controller with it when it dies. If the remaining good disk and the spare disk are on the same controller, your RAID performance will really suffer. IDE disks do not play well with others, so you need a third controller. There is also a financial drawback. Hard disks are getting cheaper. It doesn't make much sense to buy a 40GB spare disk today and let it sit unused for a year, by which point you may be able to buy a 130GB disk for the same price. Finally, each additional disk adds to the heat load in your server.
RAID-1 also may be used as a tool for creating consistent backups of large or busy ext2 filesystems. Numerous file modifications can take place during the time it takes to dump a large or busy filesystem to tape. To take a snapshot of your filesystem at a particular moment in time, one of the component devices in your RAID-1 setup can be taken off-line and remounted as a read-only filesystem. When the backup is complete, the component device may be re-added to the RAID array and resynced. If you have several RAID devices that require consistent backup support, you should consider allocating an extra partition for that task. The extra partition should be at least as large as the biggest RAID device you wish backup.
The following steps explain how to perform a backup using this method:
Make sure all RAID components of the device you want to backup are synchronized. You can choose to use one of the devices currently attached to your mirror as the "dump" device. However, the availability of your md device will be jeopardized by doing so. We recommend attaching and syncing a third partition when performing backups of this type.
Remove the device you wish to back up by failing and removing one of the components from your RAID-1 device:
mdctl --fail /dev/md0 /dev/hdc1 mdctl --remove /dev/md0 /dev/hdc1
Remount the dump device and perform your backup. This example illustrates the use of a SCSI tape drive as the backup storage device (st0):
mount -r /dev/hdc1 /mnt/backup dump -f /dev/st0 /mnt/backup
Re-add the device to your RAID-1 array. If you are using an extra partition for backup purposes, this step is optional:
umount /mnt/backup mdctl --add /dev/md0 /dev/hdc1
Set up a monitoring cron job to alert you if the RAID-1 device has problems. We use a script that compares a good copy of /proc/mdstat with the existing mdstat data. If diff finds that the data differs, an e-mail is sent to your sysadmin. The scripts found at www.1U-Raid5.net/Monitoring may help get you started.
"Kernel Korner: The Linux RAID-1, 4, 5 Code", Linux Journal, December 1997.
Usenet; one of the archives is groups.google.com. The following search queries may help to get you started: +raid1, +failed-disk +linux, +raid1 +swap +linux and +linux +raid +superblock.
If you're curious about the Raid superblock, you can find a description in the mdctl-0.5 source code. Take a look at the file md_p.h. You also can take a look at the kernel mddriver source code files including /usr/src/linux/drivers/md/md.c.
Thanks to those who developed the Linux RAID code (see drivers/md/md.c for names), Jakob Østergaard for the "The Software-RAID HOWTO", the Usenet correspondents and Niel Brown for mdctl.
Joe Edwards, PE, PhD wrote his first useful program using FORTRAN on an IBM 370 almost 30 years ago. The program performed forensic analysis of X-ray diffraction data. He started using Linux in 1995. He is the lead programmer, sysadmin and dba for the GeneTests-GeneClinics Projects at the University of Washington.
Audin Malmin is a programmer and sysadmin for the GeneTests-GeneClinics Projects at the University of Washington. His first programming experiences were on his dad's Timex Sinclair 1000. He first experimented with Linux in 1996 on his 386sx16 with 3MB of RAM.
Ron Shaker is the lead programmer on the GeneSeek Pproject at the University of Washington. He has worked as a sysadmin, dba and systems engineer over the past 13 years and began using UNIX in 1988.