High Availability Linux with Software RAID
Creating a bootable CD recovery disk can be done easily with the mkbootdisk utility. In order to include the /boot partition on the recovery CD, however, a small patch needs to be applied to mkbootdisk (Listing 1). Also, you must have the mkisofs package installed. The following commands, issued as root, take of this:
cd /sbin cp mkbootdisk mkbootdisk.orig patch -p0 \ mkbootdisk.patch
After the patch is applied, the following command creates the bootable recovery CD:
cd /tmp mkbootdisk --device bootcd.iso --iso 2.4.18-14
When using the --iso option, the specified --device is expected to be a filename to which an ISO image will be written. The last parameter, 2.4.18-14, specifies which kernel to use.
We can check the ISO image by using the following commands:
cd /tmp losetup /dev/loop1 bootcd.iso mount /dev/loop1 /mnt
They create a loopback device on which the ISO image is then mounted. Upon inspection, you should see the complete /boot directory on the CD image.
For a physical machine, this image would be burned onto a CD. For the purposes of testing, VMware can use an ISO image directly as a virtual CD-ROM drive.
Now for the fun part. One of the advantages of using VMware for testing is the ability to fail hardware without having to worry about possible repercussions to physical hardware. In order to ensure that the system behaves as expected, I ran two failure tests: failing a pure RAID drive and failing a mixed native and RAID drive.
To fail a drive under VMware, I simply shut down the VM, move the files representing a particular virtual drive to a backup folder and re-create a fresh virtual drive. This process effectively creates a fresh unpartitioned drive--exactly what the situation would be if a drive had failed and been replaced.
For the first test, I "failed" the fourth drive in the array. After a successful boot in the VM, I looked at /proc/mdstat:
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
1027584 blocks level 5, 64k chunk, algorithm 0 [5/4] [UU_UU]
md1: active raid5 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
44780800 blocks level5, 64k chunk, algorithm 0 [6/5] [UUU_UU]
It is a little counterintuitive, but the status is indicated starting with the lower drive numbers from left to right. So, for md0, [UU_UU] indicates that drives 0 and 1 are up, drive 2 is down and drives 3 and 4 are up. These correlate to sdb1, sdc1, sdd1, sde1 and sdf1, respectively. For md1, [UUU_UU] indicates that drives 0 through 2 are up, drive 3 is down and drives 4 and 5 are up. These correlate to sda2, sdb2, sdc2, sdd2, sde2 and sdf2, respectively.
As we would expect, the sdd drive has failed. At this point the RAID is running in degraded mode. If another drive were to fail, there would be data loss.
We can reintegrate the "new" drive into the array while the system is running. To do this, we need to partition the drive and use the raidhotadd utility. The drive should be partitioned exactly as it was originally. For this drive, both partitions are of type Linux raid autodetect (fd). After the drive is repartitioned, execute the following commands:
raidhotadd /dev/md0 /dev/sdd1 raidhotadd /dev/md1 /dev/sdd2 cat /proc/mdstat
After which, you should see something like the following output:
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
1027584 blocks level 5, 64k chunk, algorithm 0 [5/4] [UU_UU]
[===>.................] recovery = 18.3% (47816/256896)
finish=0.5min speed=6830K/sec
md1: active raid5 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
44780800 blocks level5, 64k chunk, algorithm 0 [6/5] [UUU_UU]
When the sync process is finished for md0, a similar process begins for md1. When completed, you should see that /proc/mdstat appears as it did earlier (with all the Us present) and that the array is no longer in degraded mode.
For the second test, I "failed" the first drive in the array. For this test, we must have the bootable CD-ROM created earlier. It either can be burned onto a CD or the file can be referenced in the VMware configuration (Figure 4).
When you boot off the CD, the welcome screen created by the mkbootdisk script appears (Figure 5). The boot fails part way through when the system attempts to mount the /boot partition. This is because the drive /dev/sda1 is not available. Enter the root password to get to maintenance mode, and then edit the filesystem table file using the command vi /etc/fstab. For now, simply comment out the line that contains the /boot entry. On my installation, the fstab file had a label reference for the /boot entry. I prefer to reference the drive directly, so I changed this entry to /dev/sda1 and then commented it out. Type exit and the system reboots, again booting off the CD. This time, it is able to start up completely.
You should notice that the md1 RAID volume is running in degraded mode by inspecting /proc/mdstat, as before. The tasks to restore the failed first drive are as follows:
Partition the drive.
Use the raidhotadd utility to rebuild the md1 RAID.
Format the native partition on the drive.
Copy the /boot files from the CD to the drive.
Uncomment the /etc/fstab file.
Install the GRUB boot loader in the MBR (master boot record) of the drive.
The drive should be partitioned exactly as it was originally. That is, the first 250MB partition should be type Linux (83), and the second 8750MB partition should be type Linux raid autodetect (fd). You can then enter the command:
raidhotadd /dev/md1 /dev/sda2
to rebuild the md1 RAID. Inspect /proc/mdstat as before to check on the status of the synchronization process.
The native partition should be formatted with the command mke2fs /dev/sda1. Assuming that the CD-ROM drive is mounted on /mnt/cdrom, the following commands restore the /boot partition:
mount /dev/sda1 /boot cp -p -r /mnt/cdrom/boot/* /boot
Next, edit the /etc/fstab, and uncomment the line containing the /boot partition. Finally, use GRUB to install the boot loader on the drive's MBR. A thorough discussion of GRUB is outside the scope of this article, but the following commands use the original GRUB configuration defined when Red Hat 8.0 was installed:
grub root (hd0,0) setup (hd0) quit
Once the md1 RAID is rebuilt, the system is ready to be rebooted without the recovery CD. Make sure the recovery CD is removed from the CD-ROM drive or that the image reference in the VMware configuration is removed, and reboot. The system should come up normally. A look at /proc/mdstat should show both RAID volumes, with all members up and running.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Dart: a New Web Programming Experience
- Developer Poll
- What's the tweeting protocol?
- May 2013 Issue of Linux Journal: Raspberry Pi
- Reply to comment | Linux Journal
4 hours 12 min ago - Reply to comment | Linux Journal
4 hours 58 min ago - Web Hosting IQ
6 hours 32 min ago - Thanks for taking the time to
8 hours 9 min ago - Linux is good
10 hours 7 min ago - Reply to comment | Linux Journal
10 hours 24 min ago - Web Hosting IQ
10 hours 54 min ago - Web Hosting IQ
10 hours 55 min ago - Web Hosting IQ
10 hours 55 min ago - Reply to comment | Linux Journal
13 hours 56 min ago





Comments
backup of /boot partition and MBR on second disk rather than CD?
Is there any reason that the following wouldn't work as ab
alternative to using a boot CD to back up the
/bootpartition and master boot record in case of a failure on the first disk:(Can BIOSes typically boot from a second disk?)
edit which /etc/fstab?
In your second test and its recovery steps, you say to edit
/etc/fstabto comment out the/bootentry.Does the boot fail after the RAID drivers/modules are
loaded, so that the volume containing
/etc/fstabis available?Re: High Availability Linux with Software RAID
using soft raid for swap is waist of CPU,
linux can do the same without soft raid:
just append to all swap partitions "priority=1"
and linux will use them as they were a part of striped soft raid.
Re: High Availability Linux with Software RAID
In case of _real_ drive fail, Linux can (and, imho, will in 99.99%) panic.
Why?
In our case drive didn't responded, stupid scsi driver tried to reset scsi adapter, then kernel died...
Certanly, this is far better that lost of _full_ filesystem, but..
Hardware raid is _only_ choise for servers ...
Re: High Availability Linux with Software RAID
I've used RAID and forced failures in dozens of ways and NEVER had a kernel panic. This is with adaptec and Sumbios controllers, and with basically unplugging the drive from a hot shoe while the server was running and serving requests (test environment, as well as actual failures in the real environment)
I did however, know a coworker using a HW RAID controller who had it mark two disks bad because the cable to them had slipped off while the server was being moved. Guess who had to rebuild and restore his whole RAID array because his $1000 RAID card wouldn't let him restart the RAID5 in place due to two bad drives.
P.s. the CPU load on my dual PIII 750 running flat out accessing it's raid arrays is about 1% of a CPU. If you have to worry about 1% of your CPU you have a lot of things on your plate ahead of that.
Re: High Availability Linux with Software RAID
I don't think thats what the author was intending to achieve max performance. But more guarenteed availablity.
If you use the partitions directly in the fstab with priority=1 and a drive fails then the mache will probally go down since a portion of the swap space is now corrupt. However if they are on a RAID 5 setup the machine will just keep on humming. Assuming you don't have a 2nd drive failure.
Re: High Availability Linux with Software RAID
Yes, You could do that, but then You loose HA, because swap will fail,
as soon a disk with a swap partition fails.
Performance wise it would be better to use raid 1 than raid 5 for swap.
Re: High Availability Linux with Software RAID
Anybody has info of how to do this using User Mode Linux?
Re: High Availability Linux with Software RAID
UML is part of the kernel, so is not affected by the RAID subsystem underneath of it. You just need to set up the RAID Disk system as explained, and then install a UML kernel, and way you go.
Re: High Availability Linux with Software RAID
thats not totally true ...
a bug in the ubd driver in uml prevents raidhotadd from working correctly. the bug is known, and a patch is available to fix it (it will be in the next uml release)
greetings,
frank
Re: High Availability Linux with Software RAID
If one is looking to truely run a HA server, would it not be better to make /boot a RAID-1 array, and use a Ramdisk to boot the machine and allow access to the Software RAID. Also, for better performance of the swap partition, rather than creating a software RAID disk for swap, set all the relevant partitions to swap space and set them to equal priority in /etc/fstab so that they are used as a RAID-0 array, without the overhead of the Software RAID system running.
Re: High Availability Linux with Software RAID
Having swap on RAID is a good idea, otherwise a single disk error
can make your machine crash.
I would tend to disagree with
I would tend to disagree with the whole concept of placing your swap on a raid partition.
See line #18 in the link below for more information:
http://linas.org/linux/Software-RAID/Software-RAID-8.html
We're not talking about strip
We're not talking about striping though, but mirroring, so if one drive dies, all the data written to swap doesn't go down with it, as that would be double plus ungood.