The Linux RAID-1, 4, 5 Code
Using RAID (Redundant Array of Inexpensive Disks) is a popular way of improving system I/O performance and reliability. There are different levels of disk arrays that cover the whole range of possibilities for improving system I/O performance and increased reliability.
This report describes the current implementation of the RAID driver in the kernel, as well as the changes we made to the kernel to support new disk-array configurations that provide higher reliability.
The MD driver is used to group together a collection of block devices into a single, larger block device. Usually, a set of SCSI and IDE devices are configured into a single MD device. As found in the Linux 2.0 kernel, it is designed to re-map sector/device tuples into new sector/devices tuples in two different modes (personalities): linear (append mode) and striping (RAID-0 mode).
Linear mode is just a way of concatenating the contents of two smaller block devices into a larger device. This can be used to join together several small disks to create a larger disk. The size of the new disk is the sum of the smaller ones. For example, suppose we have two disks with 300 sectors each; after we configure them as linear MD devices, we have a new MD device that has 600 sectors: the sectors 0 to 299 of the device are mapped to the first disk and the sectors 300 to 599 are mapped to the second disk.
RAID-0 mode (also known as striping) is more interesting. This mode of operation writes the information to the device while distributing the information over the disks that are part of the disk array. Unlike linear mode, this is not just a concatenation of the disk-array components; striping balances the I/O load among the disks resulting in a high throughput. This is the personality chosen by most people who want speed.

Figure 1 shows how four disks are arranged in this mode. Shadowed regions are those that provide redundant information, and those stacked-up disks represent a single disk. As you can see there are no shadowed regions in the figure. What does this mean? Well, it means that if there is a hardware problem in any of the elements of the disk array, you lose all of your information.
Both the linear and the striping personalities lack any redundancy and error recovery modes. If any of the elements of the disk array fail, the contents of the complete MD device are useless, and there is little hope that any useful information can be recovered. This is similar to what happens with regular secondary storage devices—if it fails, you lose your information. However, with RAID-0, you have a higher risk of losing your information than with a regular disk. The higher failure rate is due to the fact that you have more disks and a failure in any of the disks make the RAID-0 contents unusable.
If you have a good backup strategy and you don't mind losing a day of work if any of your disks fail, using RAID-0 may be the best thing to do. For example, RAID-0 is used for newsgroups like comp.unix, but a higher reliability RAID level is used for important newsgroups like alt.binaries.pictures.furniture.
The way these two personalities are supported by the MD driver in the kernel is quite simple; the low level ll_rw_blk routine is responsible for putting block driver I/O requests on the system-request queue. This routine must be modified to call a mapping function that is part of the MD driver and is invoked whenever a request is issued for a block on a MD device.
The ll_rw_blk routine conceptually looks like this:
ll_rw_blk (blocks)
{
sanity-checks ();
for-each block in blocks {
make_request (block);
}
}
It is modified to support the striping (RAID-0) and linear personalities in this way:
ll_rw_blk (blocks)
{
sanity-checks ();
for-each block in blocks {
if (block is-in md-device)
md_map (block)
}
for-each block in blocks {
make_request (block);
}
}
Block re-mapping happens just before the input/output request is put into the system-request queue. This re-mapping function is quite simple. It is invoked with pointers to the device and to the block number, and all it does is change the device ID and the block number. The device ID is changed to point to one of the disks in the disk array, and the block number is changed to point to the proper location on that disk. Basically it is a nice hack (but it uses a couple of “ifdefs”, which we all know our fearless leader does not like).
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- New Products
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Build a Skype Server for Your Home Phone System
- A Topic for Discussion - Open Source Feature-Richness?
- Tech Tip: Really Simple HTTP Server with Python
- Why Python?
- Not free anymore
2 hours 7 min ago - Great
5 hours 54 min ago - Reply to comment | Linux Journal
6 hours 2 min ago - Understanding the Linux Kernel
8 hours 17 min ago - General
10 hours 46 min ago - Kernel Problem
20 hours 49 min ago - BASH script to log IPs on public web server
1 day 1 hour ago - DynDNS
1 day 4 hours ago - Reply to comment | Linux Journal
1 day 5 hours ago - All the articles you talked
1 day 7 hours ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
error handing in RAID
Actually my question is how the error handing is done in RAID and how we can check it means the different ways to check how the error handling is done?
If i am making two virtual devices and making RAID1 using these devices and writing some data and then corrupting the data on 1st disk.Then how error handing is done and is there any way to check how it is done and similarly with RAID5????
raid 1
Hi,
How can I do the implementation of Raid 1 on Linux.
Scenario:
2 HD scsi Seagate 36.6 ULTRA 320 ST336607LC
1 Adaptec 29320A board
Could you please help me?
Thanks a lot
Pomps
Re: Kernel Korner: The Linux RAID-1, 4, 5 Code
I have been using Linux MD RAID-1 for some time now and have been satisfied with its performance. I've lost two drives in this time and I feel that the simple addition of a software mirror was well worth it!
I am about to try RAID-5 in a few minutes and this article has left me feeling comfortable that I know what my kernel is doing. Thanks guys!
Are you sure the drives are
Are you sure the drives are not dead because of miss-mapping by the md_map? I don't think this would cause any crashing, but if it is a member in a system drive array then I would think there might be a possibility of corruption and loss of md-status. I dont really know any of this, but it sure does look smart from this angle.
social cos(90)
Re: Kernel Korner: The Linux RAID-1, 4, 5 Code
Fristy Pr0st!
UR the winningest
UR the winningest OHHHHHHHHHHHHHHHHHHHHHH