RAID-1, Part 1
Part 1 of this two-part series describes RAID, in which cases RAID-1 is useful, the RAID-1 installation requirements and how to install RAID-1 when you have an existing ext2 filesystem. Part 2 covers how to set up RAID-1 with an existing and a new swap partition, how to boot from a RAID-1 device, how to use a RAID-1 array to facilitate backing up a busy filesystem or a database that cannot be taken off line for long and how to set up monitoring scripts that will notify you of problems.
We wrote this article because we could not find a complete description of the setup process, and we wanted to document what we learned to make it easier for others to implement RAID-1. We learned how to implement RAID by reviewing the "Software-RAID HOWTO" and Usenet correspondence as well as through trial and error. This article includes information from the HOWTO and Usenet archives. Please read the HOWTO and the other resources called out in the references section for a lot more information about RAID.
This article focuses on RAID-1. There are five RAID levels: Linear mode, RAID-0, RAID-1, RAID-4 and RAID-5. RAID-1 maintains an exact mirror of the data on one disk on another disk. If one of the active RAID disks is removed (or fails), the data are still intact. If there are spare disks available, and if the system survived the crash, reconstruction of the mirror will begin immediately on one of the spare disks. If there is no spare disk, the system will continue to run on the remaining good disk, until you can obtain and install a replacement disk. RAID-1 is an effective, inexpensive way to help to ensure that your system stays up when you have a hard disk failure. One also could use a RAID-1 device to facilitate backing up a busy filesystem.
A RAID-1 device (e.g., /dev/md0) maintains an exact copy (mirror) of the files in a given partition (e.g. /dev/hda2) on a separate partition (e.g./dev/hdc2). The Linux RAID code mirrors partitions, not entire disks. The partitions that make up a RAID device set should be on separate hard disks. Write performance is slightly worse than on a single device, because identical copies of the data written must be sent to every disk in the array. The write is not complete until all disk writes are finished. Reading may be faster than without RAID-1, depending on the read-balancing strategy that is implemented. We did not benchmark our RAID setups. We are using RAID to provide data redundancy, not to improve disk performance.
We suffered a hard disk failure on our production web server a few months after we installed RAID-1 on the system. We noticed that a hard disk partition failed during a routine review of our system logs (the RAID code is truly transparent, we didn't notice the failure until three days after it happened). If we were not using RAID-1, we would have found out when the system crashed. We have about twelve staff members who work on the server and hundreds of users who access our web site on a daily basis. They would have lost time had server gone down. We replaced the failed disk during scheduled downtime. A second (identical) disk failed a few weeks later with similar results. This was a much better and less stressful outcome for all concerned. We have since set up scripts that monitor the status of the RAID devices and send e-mail alerts when there are problems.
There are three requirements for RAID-1: the kernel must support RAID-1, the RAID device driver must be compiled into the kernel or be available as a module and raidtools must be installed. Technically, you can use RAID with just one hard disk but most installations use more than one disk.
If /proc/mdstat exists, RAID support was compiled into the kernel. The Personalities listing indicates which RAID devices are available. For example:
more /proc/mdstat Personalities : [RAID1] read_ahead not set unused devices: <none>
indicates that the kernel supports RAID and the RAID-1 device driver is loaded (RAID-1 may have been compiled into the kernel or have been loaded as a module).
We've set up RAID-1 arrays on Red Hat 7.0, 7.1 and 7.2 and Debian potato using the 2.4.4, 2.4.9, 2.4.12 and 2.4.17 kernels that include RAID support, and the 2.2.16-22 kernel with the md driver 0.90.0 patch. If you are using a kernel that does not include RAID support, you may use the RAID-patches and raidtools found at people.redhat.com/mingo.
If you're using IDE for your RAID devices, you should install them on different controllers. SCSI RAID setups can get away with using the same bus, but they have a greater chance of a broken disk taking down the whole machine. And, consider using different manufacturers (or at least different lots) for the disks in your RAID array. Our initial array was created with two identical hard disks. The disks failed within a month of each other.
The partition that will be used for the RAID-1 array on the second disk should be about the same size as that on the first disk. It must be at least as large as the first disk's partition. If the second disk's partition is larger, the extra space will not be used by the RAID-1 device. The smallest partition determines the size of the RAID-1 device.
Before you begin, if the /usr partition is in use and contains data that you do not want to lose, back up your data. The following list outlines the steps used to configure /usr as a RAID-1 array:
Partition the second hard disk, if needed. Partition sizes should be calculated based on sectors, not cylinders. If using fdisk to partition the disks, use the -u option. The smallest partition of a mirror set determines that metadevice's maximum size. Getting the partition sizes to match up is almost impossible. Just try to get them close while making sure the second disk's partitions are the same size or larger. You may ignore any fdisk warnings about cylinder boundaries, they are a problem only if you'll be accessing the disk with DOS/Windows. If you want the RAID-1 devices to be mounted during boot, change the filesystem type to "fd" (Linux RAID autodetect).
Go to single-user mode using the command init 1 so that /usr can be unmounted. Alternatively, you could stop all processes using the /usr partition. The fuser program can help you if you wish to take this route.
Copy the contents of /usr to /var/usr.save or another partition with enough room so that you can restore the files after you make the RAID. You will have to make a new ext2 filesystem on the original /usr (in this case /dev/hda2) partition.
umount /usr. When you went to single-user mode, the processes that were using the /usr partition should have been stopped. If you can't unmount /usr, determine which processes are still using /usr and stop them.
Use fdisk or an equivalent partition manager to toggle the file system type of the /usr partition on the original disk to fd (Linux RAID autodetect). The new disk was set to type fd in step 1.
Create /etc/raidtab with an entry for the /usr partition array (Listing 1). (Note that if you use Emacs, or some other editor that lives in /usr, you'll have to make do with vi or something else that lives in /bin.)
Make the new RAID-1 /usr array, using mkraid /dev/md0. Now type cat /proc/mdstat, which should indicate md0 : active RAID1 hdc2 hda2. You will see a progress meter measuring the state of the mirroring process. The RAID code knows nothing about what is actually on its disks, so it is now copying all the data from /dev/hda2 to /dev/hdc2. This copying happens transparently in the background, so you can now go to step 8 and format the device.
Create an ext2 filesystem on /dev/md0 using the command mke2fs /dev/md0. Do not mke2fs on the RAID-1 component partitions, in this case /dev/hda2 and /dev/hdc2. If you do not create an ext2 filesystem on /dev/md0, then e2fsck /dev/md0 will return an error message, something like this:
The filesystem size (according to the superblock) is 2104483 blocks. The physical size of the device is 2104384 blocks. Either the superblock or the partition table is likely to be corrupt.
This is because mkraid writes the RAID superblock near the end of the component partitions. e2fsck does not recognize the RAID superblock that has caused the physical size to be smaller. You can mount /dev/md0 at this point, and even use /usr, but the ext2 filesystem superblock contains incorrect information. You may not notice problems but you should not use the filesystem in this state. You will not be able to boot and mount /dev/md0 unless you turn off the filesystem checking by making the appropriate entry in fstab (e.g., /dev/md0 /usr ext2 defaults 1 0). The 0 at the end of the line causes e2fsck to be skipped. Do not do this unless you have to fix your RAID. Make /dev/md0 an ext2 filesystem.
Mount /dev/md0 /usr.
Copy the contents of /var/usr.save (or wherever) to /usr.
To mount the RAID-1 /usr array /dev/md0 on boot, edit the /etc/fstab file. The line should look something like /dev/md0 /usr ext2 defaults 1 2.
Go to multi-user mode (init 3), and you're now using the /usr RAID-1 array.
At this point, you can reboot the system to test the RAID-1 /usr array boot configuration. When you reboot you should see something like the following in the boot.log and dmesg:
md: RAID1 personality registered as nr 3', 'md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27', 'md: Autodetecting RAID arrays.', 'md: considering hdc2 ...', 'md: adding hdc2 ...', 'md: adding hda2 ...', 'md: created md0', 'RAID1: RAID set md0 active with 2 out of 2 mirrors', '......', 'fsck: /dev/md0: clean', 'rc.sysinit: Mounting local filesystems: succeeded.
Instead of copying the /usr partition to a temporary location, you may rely on the failed-disk directive and mark the existing disk (the one with the data) as failed-disk 0 and the new disk as RAID-disk 1. Then mkraid and make an ext2 filesystem on the RAID array. The failed disk will be ignored by the RAID code. Then mount the RAID device on a temporary mountpoint as a normal ext2 partition, and copy over the /usr content. Umount the failed disk, and use raidhotadd to add it to the new RAID array. Mount the RAID device as /usr, and go back to multi-user status. This is a more risky procedure than copying the files to a different partition. If you make a mistake in raidtab, you could overwrite the existing partition. And if the disk is completely full, an undesirable situation in any case, you may have problems when the RAID superblock is written; we did not test this case.
Listing 1. The /etc/raidtab with an entry for the /usr partition raiddev /dev/md0
RAID-level 1 nr-RAID-disks 2 persistent-superblock 1 chunk-size 4 device /dev/hda2 RAID-disk 0 device /dev/hdc2 RAID-disk 1 mdctl
The above example uses raidtools version 0.90.0 and relies on /proc/mdstat and the system logs to monitor the RAID-1 arrays. mdctl is another tool that can be used to create, control and monitor Linux md devices (aka RAID arrays) . We did not use mdctl to create our RAID-1 arrays, but we did use it to learn more about our RAID devices. You can get mdctl-0.5 at www.cse.unsw.edu.au/~neilb/source/mdctl. To learn more about mdctl search http://groups.google.com/ using the query +author:firstname.lastname@example.org +mdctl.
Use the command mdctl --examine /dev/hda2 to print the content of the md superblock on the device. Use the command mdctl --detail /dev/md0 to print the details of a given md array.
To deactivate the RAID-1 array, use the command raidstop /dev/md0. In some cases this does not work. For example, /dev/md0 is not mounted, but an error message says device busy. In this case, mdctl --stop /dev/md0 may work. Note, unmount the RAID-1 device before you stop the array. If you don't, your system may hang. Deactivating the RAID device halts all synchronization. An inactive RAID device cannot be mounted.
"Kernel Korner: The Linux RAID-1, 4, 5 Code", Linux Journal, December 1997.
Usenet; one of the archives is groups.google.com. The following search queries may help to get you started: +RAID1, +failed-disk +linux, +RAID1 +swap +linux and +linux +RAID +superblock.
If you're curious about the RAID superblock, you can find a description in the mdctl-0.5 source code. Take a look at the file md_p.h. You also can take a look at the kernel mddriver source code files including /usr/src/linux/drivers/md/md.c.
Thanks to those who developed the Linux RAID code (see drivers/md/md.c for names), Jakob Østergaard for the "The Software-RAID HOWTO", the Usenet correspondents and Niel Brown for mdctl.
Joe Edwards, PE, PhD wrote his first useful program using FORTRAN on an IBM 370 almost 30 years ago. The program performed forensic analysis of X-ray diffraction data. He started using Linux in 1995. He is the lead programmer, sysadmin and dba for the GeneTests-GeneClinics Projects at the University of Washington.
Audin Malmin is a programmer and sysadmin for the GeneTests-GeneClinics Projects at the University of Washington. His first programming experiences were on his dad's Timex Sinclair 1000. He first experimented with Linux in 1996 on his 386sx16 with 3MB of RAM.
Ron Shaker is the lead programmer on the GeneSeek Project at the University of Washington. He has worked as a sysadmin, dba and systems engineer over the past 13 years and began using UNIX in 1988.