Increase Performance, Reliability and Capacity with Software RAID
In the late 1980s, processing power and memory performance were increasing by more than 40% each year. However, due to mechanical limitations, hard drive performance was not able to keep up. To prepare for a “pending I/O crisis”, some researchers at Berkeley proposed a solution called “Redundant Arrays of Inexpensive Disks”. The basic idea was to combine several drives so they appear as one larger, faster and/or more-reliable drive. RAID was, and still is, an effective way for working around the limitations of individual disk drives. Although RAID is typically implemented using hardware, Linux software RAID has become an excellent alternative. It does not require any expensive hardware or special drivers, and its performance is on par with high-end RAID controllers. Software RAID will work with any block device and supports nearly all levels of RAID, including its own unique RAID level (see the RAID Levels sidebar).
Most Linux distributions have built-in support for software RAID. This article uses the server edition of Ubuntu 8.04 (Hardy). Run the following commands as root to install the software RAID management tool (mdadm) and load the RAID kernel module:
# apt-get install mdadm # cat /proc/mdstat
Once you create an array, /proc/mdstat will show you many details about your software RAID configuration. Right now, you just want to make sure it exists to confirm that everything is working.
Many people like to add a couple drives to their computer for extra file storage, and mirroring (RAID 1) is an excellent way to protect that data. Here, you are going to create a RAID 1 array using two additional disks, /dev/sda and /dev/sdb.
Before you can create your first RAID array, you need to partition your disks. Use fdisk to create one partition on /dev/sda, and set its type to “Linux RAID autodetect”. If you are just testing RAID, you can create a smaller partition, so the creation process does not take as long:
# fdisk /dev/sda > n > p > 1 > <RETURN> > <RETURN> > t > fd > w
Now, you need to create an identical partition on /dev/sdb. You could create the partition manually using fdisk, but it's easier to copy it using sfdisk. This is especially true if you are creating an array using more than two disks. Use sfdisk to copy the partition table from /dev/sda to /dev/sdb, then verify that the partition tables are identical:
# sfdisk -d /dev/sda | sfdisk /dev/sdb # fdisk -l
Now, you can use your newly created partitions (/dev/sda1 and /dev/sdb1) to create a RAID 1 array called /dev/md0. The md stands for multiple disks, and /dev/mdX is the standard naming convention for software RAID devices. Use the following command to create the /dev/md0 array from /dev/sda1 and /dev/sdb1:
# mdadm -C /dev/md0 -l1 -n2 /dev/sda1 /dev/sdb1
When an array is first created, it automatically will begin initializing (also known as rebuilding). In your case, that means making sure the two drives are in sync. This process takes a long time, and it varies based on the size and type of array you created. The /proc/mdstat file will show you the progress and provide an estimated time of completion. Use the following command to verify that the array was created and to monitor its progress:
# watch cat /proc/mdstat # ctrl-c to exit
It is safe to use the array while it's rebuilding, so go ahead and create the filesystem and mount the drive. Don't forget to add /dev/md0 to your /etc/fstab file, so the array will be mounted automatically when the system boots:
# mkfs.ext3 /dev/md0 # mkdir /mnt/md0 # mount /dev/md0 /mnt/md0 # echo "/dev/md0 /mnt/md0 ext3 defaults 0 2" >> /etc/fstab
Once the array is finished rebuilding, you need to add it to the mdadm configuration file. This will make it easier to manage the array in the future. Each time you create or modify an array, update the mdadm configuration file using the following command:
# mdadm --detail --scan >> /etc/mdadm/mdadm.conf # cat /etc/mdadm/mdadm.conf
That's it. You successfully created a two-disk RAID 1 array using software RAID.
The entire point of a RAID 1 array is to protect against a drive failure, so you are going to simulate a drive failure for /dev/sdb and rebuild the array. To do this, mark the drive as failed, and then remove it from the array. If the drive actually failed, the kernel automatically would mark the drive as failed. However, it is up to you to remove the disk from the array before replacing it. Run the following commands to fail and remove the drive:
# mdadm /dev/md0 -f /dev/sdb1 # cat /proc/mdstat # mdadm /dev/md0 -r /dev/sdb1 # cat /proc/mdstat
Notice how /dev/sdb is no longer part of the array, yet the array is functional and all your data is still there. It is safe to continue using the array as long as /dev/sda does not fail. You now are free to shut down the system and replace /dev/sdb when it's convenient. In this case, pretend you did just that. Now that your new drive is in the system, format it and add it to the array:
# sfdisk -d /dev/sda | sfdisk /dev/sdb # mdadm /dev/md0 -a /dev/sdb1 # watch cat /proc/mdstat
The array automatically will begin rebuilding itself, and /proc/mdstat should indicate how long that process will take.
In addition to creating and rebuilding arrays, you need to be familiar with a few other tasks. It is important to understand how to start and stop arrays. Run the following commands to unmount and stop the RAID 1 array you created earlier:
# umount /dev/md0 # mdadm -S /dev/md0 # cat /proc/mdstat
As you can see, the array no longer is listed in /proc/mdstat. In order to start your array again, you need to assemble it (there isn't a start command). Run the following commands to assemble and remount your array:
# mdadm -As /dev/md0 # mount /dev/md0 # cat /proc/mdstat
Sometimes it's useful to place an array in read-only mode. Before you do this, you need to unmount the filesystem (you can't just remount as read-only). If you try to place an array in read-only mode while it is mounted, mdadm will complain that the device is busy:
# umount /dev/md0 # mdadm -o /dev/md0 # cat /proc/mdstat # mount /dev/md0
Notice how /dev/md0 is now read-only, and the filesystem was mounted as read-only automatically. Run the following commands to change the array and filesystem back to read-write mode:
# mdadm -w /dev/md0 # mount -o remount,rw /dev/md0
mdadm can be configured to send e-mail notifications regarding the status of your RAID arrays. Ubuntu automatically starts mdadm in monitoring mode for you; however, it currently is configured to send e-mail to the root user on the local system. To change this, edit /etc/mdadm/mdadm.conf and set MAILADDR to your e-mail address, then restart the mdadm dæmon:
# vim /etc/mdadm/mdadm.conf
Set MAILADDR to <your e-mail address>, and then do:
# /etc/init.d/mdadm restart
Run the following command to test your e-mail notification setup:
# mdadm --monitor --scan -t -1
If you are building a new server, you can use the Ubuntu Alternate install CD to set up your system on a software RAID array. If you don't have the luxury of performing a clean install, you can use the following process to convert your entire server to a RAID 1 array remotely. This requires your server to have an additional drive that is the same size or larger than the first disk. These instructions also assume you are using the server edition of Ubuntu 8.04 (Hardy), but the process is similar in other Linux distributions. You always should test a procedure like this before performing it on a production server.
Install mdadm and verify that the software RAID kernel module was loaded properly:
# apt-get install mdadm # cat /proc/mdstat
Copy the partition table from your first drive to your second drive, and set the partition types to “Linux RAID autodetect”:
# sfdisk -d /dev/sda | sfdisk /dev/sdb # fdisk /dev/sdb > t > 1 > fd > t > 2 > fd > w
Create the RAID 1 arrays for the root and swap partitions, and update the mdadm configuration file. This time, specify that the first drive is “missing”, which will delay the rebuild until you add the first drive to the array. You don't want to mess with the first drive until you are sure the RAID configuration is working correctly:
# mdadm -C /dev/md0 -n2 -l1 missing /dev/sdb1 # root # mdadm -C /dev/md1 -n2 -l1 missing /dev/sdb2 # swap # cat /proc/mdstat # mdadm --detail --scan >> /etc/mdadm/mdadm.conf
Modify /boot/grub/menu.lst so your server boots from the array:
# vim /boot/grub/menu.lst
Add fallback 1 to a new line after default 0.
Change the kopt line to # kopt=root=/dev/md0 ro.
Copy the first kernel entry and change (hd0,0) to (hd1,0).
Change root=xxx to root=/dev/md0 in the new kernel entry.
When your server is booting up, it needs to be able to load the RAID kernel modules and start your array. Use the following command to update your initrd file:
# update-initramfs -u
At this point, you can create and mount the filesystem, then copy your data to the additional drive. Make sure all of your applications are shut down and the server is idle; otherwise, you run the risk of losing any data modified after you run the rsync command:
# mkswap /dev/md1 # mkfs.ext3 /dev/md0 # mkdir /mnt/md0 # mount /dev/md0 /mnt/md0 # rsync -avx / /mnt/md0
To mount the RAID arrays automatically when your server reboots, modify /mnt/md0/etc/fstab and replace /dev/sda1 with /dev/md0, and replace /dev/sda2 with /dev/md1. You do this only on the second drive, in case you need to fall back to the old setup if something goes wrong:
# vim /mnt/md0/etc/fstab
Replace /dev/sda1 with /dev/md0.
Replace /dev/sda2 with /dev/md1.
Make sure GRUB is installed properly on both disks, and reboot the server:
# grub > device (hd0) /dev/sda > root (hd0,0) > setup (hd0) > device (hd0) /dev/sdb > root (hd0,0) > setup (hd0) > quit # reboot
When your server comes back on-line, it will be running on a RAID 1 array with only one drive in the array. To complete the process, you need to repartition the first drive, add it to the array, and make a few changes to GRUB. Make sure your server is functioning normally and all your data is intact before proceeding. If not, you run the risk of losing data when you repartition your disk or rebuild the array.
Use sfdisk to repartition the first drive to match the second drive. The --no-reread option is needed; otherwise, sfdisk will complain about not being able to reload the partition table and fail to run:
# sfdisk -d /dev/sdb | sfdisk /dev/sda --no-reread
Now that your first drive has the correct partition types, add it to both arrays. The arrays will start the rebuild process automatically, which you can monitor with /proc/mdstat:
# mdadm /dev/md0 -a /dev/sda1 # mdadm /dev/md1 -a /dev/sda2 # watch cat /proc/mdstat
Once the arrays have completed rebuilding, you safely can reconfigure GRUB to boot from the first drive. Although it is not required, you can reboot to make sure your server still will boot from the first drive:
# vim /boot/grub/menu.lst
Next, copy first kernel entry and change (hd1,0) to (hd0,0). Then:
That completes the process. Your server should be running on a RAID 1 array protecting your data from a drive failure.
As you can see, Linux software RAID is very flexible and easy to use. It can protect your data, increase server performance and provide additional capacity for storing data. Software RAID is a high-performance, low-cost alternative to hardware RAID and is a viable option for both desktops and servers.
RAID is extremely versatile and uses a variety of techniques to increase capacity, performance and reliability as well as reduce cost. Unfortunately, you can't quite have all those at once, which is why there are many different implementations of RAID. Each implementation, or level, is designed for different needs.
Mirroring (RAID 1):
Mirroring creates an identical copy of the data on each disk in the array. The array has the capacity of only a single disk, but the array will remain functional as long as at least one drive is still good. Read and write performance is the same or slightly slower than a single drive. However, read performance will increase if there are multiple read requests at the same time, because each drive in the array can handle a read request (parallel reads). Mirroring offers the highest level of reliability.
Striping (RAID 0):
Striping spreads the data evenly over all disks. This makes reads and writes very fast, because it uses all the disks at the same time, and the capacity of the array is equal to the total capacity of all the drives in the array. However, striping does not offer any redundancy, and the array will fail if it loses a single drive. Striping provides the best performance and capacity.
Striping with Parity (RAID 4, 5, 6):
In addition to striping, these levels add information that is used to restore any missing data after a drive failure. The additional information is called parity, and it is computed from the actual data. RAID 4 stores all the parity on a single disk; RAID 5 stripes the parity across all the disks, and RAID 6 stripes two different types of parity. Each copy of parity uses one disk worth of capacity in the array, so RAID 4 and 5 have the capacity of N-1 drives, and RAID 6 has the capacity of N-2 drives. RAID 4 and 5 require at least three drives and can lose any single disk in the array; whereas RAID 6 requires at least four drives and can withstand losing up to two disks. Write performance is relatively slow because of the parity calculations, but read performance is usually very fast as the data is striped across all of the drives. These RAID levels provide a nice blend of capacity, reliability and performance.
Nesting RAID Levels (RAID 10):
It is possible to create an array where the “drives” actually are other arrays. One common example is RAID 10, which is a RAID 0 array created from multiple RAID 1 arrays. Like RAID 5, it offers a nice blend of capacity, reliability and performance. However, RAID 10 gives up some capacity for increased performance and reliability, and it requires a minimum of four drives. The capacity of RAID 10 is half the total capacity of all the drives, because it uses a mirror for each “drive” in the stripe. It offers more reliability, because you can lose up to half the disks as long as you don't lose both drives in one of the mirrors. Read performance is excellent, because it uses striping and parallel reads. Write performance also is very good, because it uses striping and does not have to calculate parity like RAID 5. This RAID level typically is used for database servers because of its performance and reliability.
MD RAID 10:
This is a special RAID implementation that is unique to Linux software RAID. It combines striping and mirroring and is very similar to RAID 10 with regard to capacity, performance and reliability. However, it will work with two or more drives (including odd numbers of drives) and is managed as a single array. In addition, it has a mode (raid10,f2) that offers RAID 0 performance for reads and very fast random writes.
Spanning (Linear or JBOD):
Although spanning is not technically RAID, it is available on nearly every RAID card and software RAID implementation. Spanning appends disks together, so the data fills up one disk then moves on to the next. It does not increase reliability or performance, but it does increase capacity, and it typically is used to create a larger drive out of a bunch of smaller drives of varying sizes.
Original RAID Paper: www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf
Linux RAID: linux-RAID.osdl.org/index.php/Linux_Raid
Why Software RAID?: linux.yyz.us/why-software-RAID.html
MD RAID 10: cgi.cse.unsw.edu.au/~neilb/01093607424
Will Reese has worked with Linux for the past ten years, primarily scaling Web applications running on Apache, Python and PostgreSQL. He currently is a developer at HushLabs working on www.natuba.com.