The Linux RAID-1, 4, 5 Code

A description of the implementation of the RAID-1, RAID-4 and RAID-5 personalities of the MD device driver in the Linux kernel, providing users with high performance and reliable, secondary-storage capability using software.

Using RAID (Redundant Array of Inexpensive Disks) is a popular way of improving system I/O performance and reliability. There are different levels of disk arrays that cover the whole range of possibilities for improving system I/O performance and increased reliability.

This report describes the current implementation of the RAID driver in the kernel, as well as the changes we made to the kernel to support new disk-array configurations that provide higher reliability.

The Multiple Devices (MD) Driver

The MD driver is used to group together a collection of block devices into a single, larger block device. Usually, a set of SCSI and IDE devices are configured into a single MD device. As found in the Linux 2.0 kernel, it is designed to re-map sector/device tuples into new sector/devices tuples in two different modes (personalities): linear (append mode) and striping (RAID-0 mode).

Linear mode is just a way of concatenating the contents of two smaller block devices into a larger device. This can be used to join together several small disks to create a larger disk. The size of the new disk is the sum of the smaller ones. For example, suppose we have two disks with 300 sectors each; after we configure them as linear MD devices, we have a new MD device that has 600 sectors: the sectors 0 to 299 of the device are mapped to the first disk and the sectors 300 to 599 are mapped to the second disk.

RAID-0 mode (also known as striping) is more interesting. This mode of operation writes the information to the device while distributing the information over the disks that are part of the disk array. Unlike linear mode, this is not just a concatenation of the disk-array components; striping balances the I/O load among the disks resulting in a high throughput. This is the personality chosen by most people who want speed.

Figure 1 shows how four disks are arranged in this mode. Shadowed regions are those that provide redundant information, and those stacked-up disks represent a single disk. As you can see there are no shadowed regions in the figure. What does this mean? Well, it means that if there is a hardware problem in any of the elements of the disk array, you lose all of your information.

Both the linear and the striping personalities lack any redundancy and error recovery modes. If any of the elements of the disk array fail, the contents of the complete MD device are useless, and there is little hope that any useful information can be recovered. This is similar to what happens with regular secondary storage devices—if it fails, you lose your information. However, with RAID-0, you have a higher risk of losing your information than with a regular disk. The higher failure rate is due to the fact that you have more disks and a failure in any of the disks make the RAID-0 contents unusable.

If you have a good backup strategy and you don't mind losing a day of work if any of your disks fail, using RAID-0 may be the best thing to do. For example, RAID-0 is used for newsgroups like comp.unix, but a higher reliability RAID level is used for important newsgroups like alt.binaries.pictures.furniture.

The way these two personalities are supported by the MD driver in the kernel is quite simple; the low level ll_rw_blk routine is responsible for putting block driver I/O requests on the system-request queue. This routine must be modified to call a mapping function that is part of the MD driver and is invoked whenever a request is issued for a block on a MD device.

The ll_rw_blk routine conceptually looks like this:

ll_rw_blk (blocks)
{
    sanity-checks ();
    for-each block in blocks {
        make_request (block);
    }
}

It is modified to support the striping (RAID-0) and linear personalities in this way:

ll_rw_blk (blocks)
{
    sanity-checks ();
    for-each block in blocks {
        if (block is-in md-device)
            md_map (block)
    }
    for-each block in blocks {
        make_request (block);
    }
}

Block re-mapping happens just before the input/output request is put into the system-request queue. This re-mapping function is quite simple. It is invoked with pointers to the device and to the block number, and all it does is change the device ID and the block number. The device ID is changed to point to one of the disks in the disk array, and the block number is changed to point to the proper location on that disk. Basically it is a nice hack (but it uses a couple of “ifdefs”, which we all know our fearless leader does not like).

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

error handing in RAID

Anonymous's picture

Actually my question is how the error handing is done in RAID and how we can check it means the different ways to check how the error handling is done?
If i am making two virtual devices and making RAID1 using these devices and writing some data and then corrupting the data on 1st disk.Then how error handing is done and is there any way to check how it is done and similarly with RAID5????

raid 1

pomps's picture

Hi,

How can I do the implementation of Raid 1 on Linux.

Scenario:

2 HD scsi Seagate 36.6 ULTRA 320 ST336607LC
1 Adaptec 29320A board

Could you please help me?

Thanks a lot

Pomps

Re: Kernel Korner: The Linux RAID-1, 4, 5 Code

Anonymous's picture

I have been using Linux MD RAID-1 for some time now and have been satisfied with its performance. I've lost two drives in this time and I feel that the simple addition of a software mirror was well worth it!

I am about to try RAID-5 in a few minutes and this article has left me feeling comfortable that I know what my kernel is doing. Thanks guys!

Are you sure the drives are

Anonymous's picture

Are you sure the drives are not dead because of miss-mapping by the md_map? I don't think this would cause any crashing, but if it is a member in a system drive array then I would think there might be a possibility of corruption and loss of md-status. I dont really know any of this, but it sure does look smart from this angle.
social cos(90)

Re: Kernel Korner: The Linux RAID-1, 4, 5 Code

Anonymous's picture

Fristy Pr0st!

UR the winningest

Anonymous's picture

UR the winningest OHHHHHHHHHHHHHHHHHHHHHH

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState