Increase Performance, Reliability and Capacity with Software RAID
In the late 1980s, processing power and memory performance were increasing by more than 40% each year. However, due to mechanical limitations, hard drive performance was not able to keep up. To prepare for a “pending I/O crisis”, some researchers at Berkeley proposed a solution called “Redundant Arrays of Inexpensive Disks”. The basic idea was to combine several drives so they appear as one larger, faster and/or more-reliable drive. RAID was, and still is, an effective way for working around the limitations of individual disk drives. Although RAID is typically implemented using hardware, Linux software RAID has become an excellent alternative. It does not require any expensive hardware or special drivers, and its performance is on par with high-end RAID controllers. Software RAID will work with any block device and supports nearly all levels of RAID, including its own unique RAID level (see the RAID Levels sidebar).
Most Linux distributions have built-in support for software RAID. This article uses the server edition of Ubuntu 8.04 (Hardy). Run the following commands as root to install the software RAID management tool (mdadm) and load the RAID kernel module:
# apt-get install mdadm # cat /proc/mdstat
Once you create an array, /proc/mdstat will show you many details about your software RAID configuration. Right now, you just want to make sure it exists to confirm that everything is working.
Many people like to add a couple drives to their computer for extra file storage, and mirroring (RAID 1) is an excellent way to protect that data. Here, you are going to create a RAID 1 array using two additional disks, /dev/sda and /dev/sdb.
Before you can create your first RAID array, you need to partition your disks. Use fdisk to create one partition on /dev/sda, and set its type to “Linux RAID autodetect”. If you are just testing RAID, you can create a smaller partition, so the creation process does not take as long:
# fdisk /dev/sda > n > p > 1 > <RETURN> > <RETURN> > t > fd > w
Now, you need to create an identical partition on /dev/sdb. You could create the partition manually using fdisk, but it's easier to copy it using sfdisk. This is especially true if you are creating an array using more than two disks. Use sfdisk to copy the partition table from /dev/sda to /dev/sdb, then verify that the partition tables are identical:
# sfdisk -d /dev/sda | sfdisk /dev/sdb # fdisk -l
Now, you can use your newly created partitions (/dev/sda1 and /dev/sdb1) to create a RAID 1 array called /dev/md0. The md stands for multiple disks, and /dev/mdX is the standard naming convention for software RAID devices. Use the following command to create the /dev/md0 array from /dev/sda1 and /dev/sdb1:
# mdadm -C /dev/md0 -l1 -n2 /dev/sda1 /dev/sdb1
When an array is first created, it automatically will begin initializing (also known as rebuilding). In your case, that means making sure the two drives are in sync. This process takes a long time, and it varies based on the size and type of array you created. The /proc/mdstat file will show you the progress and provide an estimated time of completion. Use the following command to verify that the array was created and to monitor its progress:
# watch cat /proc/mdstat # ctrl-c to exit
It is safe to use the array while it's rebuilding, so go ahead and create the filesystem and mount the drive. Don't forget to add /dev/md0 to your /etc/fstab file, so the array will be mounted automatically when the system boots:
# mkfs.ext3 /dev/md0 # mkdir /mnt/md0 # mount /dev/md0 /mnt/md0 # echo "/dev/md0 /mnt/md0 ext3 defaults 0 2" >> /etc/fstab
Once the array is finished rebuilding, you need to add it to the mdadm configuration file. This will make it easier to manage the array in the future. Each time you create or modify an array, update the mdadm configuration file using the following command:
# mdadm --detail --scan >> /etc/mdadm/mdadm.conf # cat /etc/mdadm/mdadm.conf
That's it. You successfully created a two-disk RAID 1 array using software RAID.
The entire point of a RAID 1 array is to protect against a drive failure, so you are going to simulate a drive failure for /dev/sdb and rebuild the array. To do this, mark the drive as failed, and then remove it from the array. If the drive actually failed, the kernel automatically would mark the drive as failed. However, it is up to you to remove the disk from the array before replacing it. Run the following commands to fail and remove the drive:
# mdadm /dev/md0 -f /dev/sdb1 # cat /proc/mdstat # mdadm /dev/md0 -r /dev/sdb1 # cat /proc/mdstat
Notice how /dev/sdb is no longer part of the array, yet the array is functional and all your data is still there. It is safe to continue using the array as long as /dev/sda does not fail. You now are free to shut down the system and replace /dev/sdb when it's convenient. In this case, pretend you did just that. Now that your new drive is in the system, format it and add it to the array:
# sfdisk -d /dev/sda | sfdisk /dev/sdb # mdadm /dev/md0 -a /dev/sdb1 # watch cat /proc/mdstat
The array automatically will begin rebuilding itself, and /proc/mdstat should indicate how long that process will take.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Non-Linux FOSS: libnotify, OS X Style
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Reply to comment | Linux Journal
52 min 56 sec ago
- Reply to comment | Linux Journal
4 hours 52 min ago
- Yeah, user namespaces are
6 hours 8 min ago
- Cari Uang
9 hours 40 min ago
- user namespaces
12 hours 33 min ago
12 hours 59 min ago
- One advantage with VMs
15 hours 28 min ago
- about info
16 hours 1 min ago
16 hours 2 min ago
16 hours 3 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?