Recovery of RAID and LVM2 Volumes

Raid and Logical Volume Managers are great, until you lose data.
Restoring Access to the RAID Array Members

To recover, the first thing to do is to move the drive to another machine. You can do this pretty easily by putting the drive in a USB2 hard drive enclosure. It then will show up as a SCSI hard disk device, for example, /dev/sda, when you plug it in to your recovery computer. This reduces the risk of damaging the recovery machine while attempting to install the hardware from the original computer.

The challenge then is to get the RAID setup recognized and to gain access to the logical volumes within. You can use sfdisk -l /dev/sda to check that the partitions on the old drive are still there.

To get the RAID setup recognized, use mdadm to scan the devices for their raid volume UUID signatures, as shown in Listing 3.

This format is very close to the format of the /etc/mdadm.conf file that the mdadm tool uses. You need to redirect the output of mdadm to a file, join the device lines onto the ARRAY lines and put in a nonexistent second device to get a RAID1 configuration. Viewing the the md array in degraded mode will allow data recovery:

[root@recoverybox ~]# mdadm --examine --scan  /dev/sda1
 ↪/dev/sda2 /dev/sda3 >> /etc/mdadm.conf
[root@recoverybox ~]# vi /etc/mdadm.conf

Edit /etc/mdadm.conf so that the devices statements are on the same lines as the ARRAY statements, as they are in Listing 4. Add the “missing” device to the devices entry for each array member to fill out the raid1 complement of two devices per array. Don't forget to renumber the md entries if the recovery computer already has md devices and ARRAY statements in /etc/mdadm.conf.

Then, activate the new md devices with mdadm -A -s, and check /proc/mdstat to verify that the RAID array is active. Listing 5 shows how the raid array should look.

If md devices show up in /proc/mdstat, all is well, and you can move on to getting the LVM volumes mounted again.

Recovering and Renaming the LVM2 Volume

The next hurdle is that the system now will have two sets of lvm2 disks with VolGroup00 in them. Typically, the vgchange -a -y command would allow LVM2 to recognize a new volume group. That won't work if devices containing identical volume group names are present, though. Issuing vgchange -a y will report that VolGroup00 is inconsistent, and the VolGroup00 on the RAID device will be invisible. To fix this, you need to rename the volume group that you are about to mount on the system by hand-editing its lvm configuration file.

If you made a backup of the files in /etc on raidbox, you can edit a copy of the file /etc/lvm/backup/VolGroup00, so that it reads VolGroup01 or RestoreVG or whatever you want it to be named on the system you are going to restore under, making sure to edit the file itself to rename the volume group in the file.

If you don't have a backup, you can re-create the equivalent of an LVM2 backup file by examining the LVM2 header on the disk and editing out the binary stuff. LVM2 typically keeps copies of the metadata configuration at the beginning of the disk, in the first 255 sectors following the partition table in sector 1 of the disk. See /etc/lvm/lvm.conf and man lvm.conf for more details. Because each disk sector is typically 512 bytes, reading this area will yield a 128KB file. LVM2 may have stored several different text representations of the LVM2 configuration stored on the partition itself in the first 128KB. Extract these to an ordinary file as follows, then edit the file:

dd if=/dev/md2 bs=512 count=255 skip=1 of=/tmp/md2-raw-start
vi /tmp/md2-raw-start

You will see some binary gibberish, but look for the bits of plain text. LVM treats this metadata area as a ring buffer, so there may be multiple configuration entries on the disk. On my disk, the first entry had only the details for the physical volume and volume group, and the next entry had the logical volume information. Look for the block of text with the most recent timestamp, and edit out everything except the block of plain text that contains LVM declarations. This has the volume group declarations that include logical volumes information. Fix up physical device declarations if needed. If in doubt, look at the existing /etc/lvm/backup/VolGroup00 file to see what is there. On disk, the text entries are not as nicely formatted and are in a different order than in the normal backup file, but they will do. Save the trimmed configuration as VolGroup01. This file should then look like Listing 6.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thanks

Anonymous's picture

Count another life saved. In spite of destroying one HD out of a two HD LVM set, we still recovered some data thanks to these tips. Not too shabby.

Alternate recovery method

Garth Webb's picture

In my situation I did not have any explicit RAID arrays. I just had the standard RedHat FC5 configuration of a single VolGroup00 volume group with two logical volumes. I pulled that 20GB drive from my system recently and installed a 320GB drive in its place and reinstalled FC5. Because this new drive had the same VolGroup00 volume group created, I could not mount the 20 GB drive I had in a USB enclosure.

It seemed that most of this article was aimed at teasing out the lvm metadata and rewriting it to affect a volume group name change. Since all of the lvm tools require that you address the volume groups on your physical drives by their name, you enconter the naming conflict (how hard would it have been to include a rename command that took a physical path and renamed the volume there?).

Rather than fight that battle, I booted off my FC5 rescue CD (or any bootable tools CD with the LVM tools on it) and did a:

vgrename VolGroup00 Seagate320

The naming conflict didn't really matter here it seems. It just renamed whatever VolGroup00 it found first, which happened to be my new 320 G drive. I could then activate both with:

vgchange -ay

and then mount the volumes and copy, etc.

Nice recovery method

rbulling's picture

This looks like a much simpler way to do things, as long as you are not dealing with software RAID.

One thing you'd need to be careful about is making sure that you leave the new VolGroup00 named VolGroup00 at the end of the recovery process.

I suspect that using vgrename / vgscan / vgchange in combination would allow you to rename both volume groups to something else, then rename the newest VolGroup00 back to VolGroup00, so that the system would continue to work on boot.

You could probably use the same technique after you recovered the RAID configuration, and avoid the messy surgery on raw disk information. Next time I encounter this issue, I'll give that a try.

With my distro (Ubuntu), and

Anonymous's picture

With my distro (Ubuntu), and I suspect many others these days, drives are actually mounted by UUID, not by device name *or* by VG/LV name. So renaming the VG, even the one containing root, should *) not be a problem.

What I did to recover with a RAID1/LVM stack which blew one of the drives, was boot the ubuntu desktop liveCD, sudo apt-get install mdadm lvm2, mdadm --assemble /dev/mdX (be careful of raid devnames that occur in multiples -- ie run this on recoveryPC with old_drives, and disconnected recoveryPC_drives -- but it will simply scan the drives attached for appropriate partitions), then the lvm stuff will simply magically appear and you can vgrename easily if necessary. Once you've done that, connect all the drives and reboot, and after the mdadm.conf magic you will have all your data.

Incidentally, on my home server, raid1 OS disks, Raid5 storage disks, and LVM over the top, hell yeah that makes sense. On my desktop, not so much.

* Yeah. I haven't tested this explicitly.

Thanks!

Juan's picture

This article save my day!

Thanks!

Anders Båtstrand's picture

This worked great for me. Thanks for putting it together!

Doesn't quite work for me

jweage's picture

I just ran into a similar problem attempting to move a disk from one machine to another, with both disks configured as VolGroup00. I worked through your example, but when it came to restoring VolGroup01 (Listing 6), vgcfgrestore refused claiming it couldn't find a contents line. In my dump, there are 5 additional header lines before the VolGroup01 { line, which vgcfgrestore requires.

After I figured this out and restored the volume group, I could not get any logical volumes to show up. lvscan did not pick up the three logical volumes on the volume group! Those were also in the dd extracted file, so I had to add all of that back into the config file and do another vgcfgrestore, vgactivate.

This is really disconcerting, as this is likely to be a common problem. Unfortunately it seems that LVM is NOT the way to go for the typical workstation, unless someone really needs the ability to resize a volume.

Correction to Listing 6

rbulling's picture

It appears that Listing 6 got truncated somewhere along the line before publication.

The full Listing 6 should be:


Listing 6: Modified Volume Group Configuration File

VolGroup01 {
id = "xQZqTG-V4wn-DLeQ-bJ0J-GEHB-4teF-A4PPBv"
seqno = 1
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 65536
max_lv = 0
max_pv = 0

physical_volumes {

pv0 {
id = "tRACEy-cstP-kk18-zQFZ-ErG5-QAIV-YqHItA"
device = "/dev/md2"

status = ["ALLOCATABLE"]
pe_start = 384
pe_count = 2365
}
}

# Generated by LVM2: Sun Feb 5 22:57:19 2006
logical_volumes {

LogVol00 {
id = "i17qXJ-Blzu-u1Dr-bSlR-0kNC-yuBH-lnbkSi"
status = ["READ", "WRITE", "VISIBLE"]
segment_count = 1

segment1 {
start_extent = 0
extent_count = 2364

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv0", 0
]
}
}
}
}

contents = "Text Format Volume Group"
version = 1

description = ""

creation_host = "localhost.localdomain" # Linux localhost.localdomain 2.6.9-11.EL #1 Wed Jun 8 20:20:13 CDT 2005 i686
creation_time = 1139180239 # Sun Feb 5 22:57:19 2006

what if the machine your

dave's picture

what if the machine your using for recovery has raid itself?? when you append to mdadm.conf can md0,1,2 be renumbered to 3,4,5?

Renumbering md0 to md3, for example, works

rbulling's picture

You should be able to do that without any problems, as long as you explicitly keep the UUID signature in the renamed device line.

Experienced this exact

Neekofab's picture

Experienced this exact problem. moved a md0/md1 disk to a recovery workstation that already had an md0/md1 device. they could not coexist, and I could not find a way to move the additional md0/md1 devices to md2/md3. I ended up disconnecting the system md0/md1 devices, booting up with sysresccd and shoving the data over the network.

bleah

I ran into the same issue

Anonymous's picture

I ran into the same issue and solved it with a little reading about mdadm. All you have to do is create a new array from the old disks.

# MAKEDEV md1
# mdadm -C /dev/md1 -l 1 -n 2 missing /dev/sdb1

Voila. Your raid array has now been moved from md0 to md1.

I ran into the same issue

Anonymous's picture

I ran into the same issue and solved it with a little reading about mdadm. All you have to do is create a new array from the old disks.

# MAKEDEV md1
# mdadm -C /dev/md1 -l 1 -n 2 missing /dev/sdb1

Voila. Your raid array has now been moved from md0 to md1.

Recovery Linux Data

Anonymous's picture

Kernel Linux - Ext2 & Ext3 Data Recovery Software supports LVM Partition PARTIALLY
LVM Recovery

LVM Data Recovery

Chris - Armor-IT's picture

I review data recovery software for my employer. I would like an opportunity to test your software if possible.

Data Recovery Software for Linux

Unistal Data Recovery's picture

If you need efficient linux data recovery software, which runs on Linux then download Linux Data Recovery Software.

LVM Recovery

Recover Data's picture

Recover Data for Linux also supports LVM Recovery. Download the software DEMO and scan you disk for lost data and files.

Visit : Recover Data Software

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix