Recovering from a Hard Drive Failure
Have you ever woke up in the morning and said to yourself, “today is the day that I'm finally going to backup my workstation!” only to find out that you're a day late and about 320Gb short? Well, that's about what happened to me recently, but don't worry, the story has a happy ending. I'm getting ahead of myself though.
Most people's excuse for not performing routine maintenance or regular backups is that they just don't have time. So when I discovered that I had some down time, I decided to take to take care of a few issues on my workstation. I performed a system update. Since I leave my system on all the time, I decided to upgrade the kernel and try to get software suspend working so I could cut down on energy consumption and heat production in my office. Finally, I resolved to finish backing up my home directory.
The system update went without incident and the kernel compiled and installed without error. The next step was to reboot into the new kernel. When the kernel panic'ed, I figured that I had missed something in the kernel configuration, so I rebooted back to my older kernel, which also panic'ed. Since this system had been running not 15 minutes ago, I knew things were about to get ugly.
At this point, I remembered that I had been doing some testing with an Ubuntu live CD, so I booted the live CD. At least now, I could get some work done, even though my workstation was “toes up.” This would also give me a platform from which to work on my regular hard drive, or so I thought. When I attempted to mount /dev/sda3, I was told that it didn't exist. Fdisk told me that my partition table was mostly gone! All that was left was /dev/sda1, where I keep my kernel, and /dev/sda2, which is where I swap. I posted a message describing my situation to the Gentoo user's group and was told that I should look into a program called testdisk.
I figured that I should at least assess /dev/sda1, so I tried to mount it. No such luck. The filesystem wasn't recognized. A quick look at /proc/filesystems told me that Ubuntu hadn't loaded ext2 support into the kernel. Further investigation revealed that Ubuntu loaded all of it's drivers from an initial ram disk and they weren't immediately available in /lib/modules. I couldn't bring myself to dissect an initial ram disk image on a system that was RUNNING on a ram disk, so out came the Gentoo installation CD.
It was while watching the Gentoo CD boot, that I saw the IDE disk seek error messages for the first time. I don't reboot my system very often and the Ubuntu live CD hides those messages from you, so who knows how long I'd been working with a drive that needed to be replaced?
Once the Gentoo CD had booted, it was time to try to recover my system. I discovered that testdisk wasn't installed on the CD, so I had to wget and untar it first. Oddly enough, I had to run testdisk and reboot a couple times before I had a partition table that looked sane. When I tried to mount the filesystem, I was told that mount couldn't find a valid filesystem. As a list ditch effort, I decided to try to fsck the filesystem anyway. The fsck program reported that it couldn't find a superblock, but this was the first good news I had received so far; I knew I could use the -b parameter and ask fsck to use a backup superblock. At least fsck hadn't choked completely. So, I issued a command like fsck -y -t ext2 -b 8192 /dev/sda3 to see what would happen. When fsck started to spew error messages indicating fix-ups it was performing, I decided that the process would take a while and went to be for the night.
When I woke up, I found that fsck had finished so I mounted the resulting filesystem. I was really hoping to see all of my files intact, but no, all I saw was /lost+found. When I cd'ed into the lost+found directory, I got my first glimpse of just how bad things had been. The fsck program had done it's job and recovered my filesystem, but it was unable to recover any of the file names at the root of the partition, so it moved the files to the lost+found directory and renamed each file after it's I-node number. All I had was a list of files and directories with names resembling #19539303. And the directory list was several screens in length; I usually keep a pretty clean / directory, so obviously, fsck had encountered a lot of trouble.
One of these oddly-named directories was my /home directory. I made an educated guess as to which one that was and sure enough, I had user directories. (My /home directory was the one reported with the largest file size.) Deeper inspection revealed that most of my files seemed to be there, and they were properly named! I was in business!
When my new disk arrived, I installed it and started copying my old files onto the new drive. I was immediately struck by how slow this process was going. It was as if I were transferring the files over a dial-up modem! It didn't help that the IDE subsystem had reset a few times in the process. At this rate the new drive would be out of warranty by the time my file recovery was complete, so I had to do something. It turns out that I had accumulated a lot of files in my home directory that I really didn't need. I had downloaded games and other software and simply built them in my home directory rather than installed them on the system. After I had pruned out all of the files and directories that I didn't care about, I was able to recover the rest of my /home directory.
So there you have it. When I started, I had a dead machine, a failing hard drive, a corrupt partition table, and a corrupt filesystem. When I had finished, I had at least recovered the important files from the system and had been able to carry on my day-to-day work without too much interruption, thanks to the Live CD. But there are some lessons to be learned here, which is why I chose to write about my experience.
I should have backed up yesterday. But for the record, my business files were on my server and I have redundant, off-site backups of them. I was mostly interested in recovering my password wallet, a few pictures and videos that I'd saved, and a few miscellaneous documents. OK, lesson learned.
But there's more. I was grateful to be able to keep running using a Live CD. However, I'm a KDE user and the Ubuntu CD that I had was Gnome-based. I got my work done, but it would have been nice to be in an environment that I was accustomed to using. In the future, I'll be keeping a Knoppix or Kubuntu CD handy.
I also found that my Gentoo CD just wasn't up to the task of system recovery. I'll be burning a genuine recovery disk, as soon as I have a system on which to burn CD's.
I really needed to have a set of emergency CD's handy for this situation. I could see having a CD wallet that had a Live CD, a Recovery CD, and an Installation CD. Having these CD's handy would have saved me a lot of time.
That said, I have to say that I'm glad to have been able to recover my data and that I wasn't down too terribly long in the process. I also wanted to mention how helpful the Linux Community is in times like this. I'm a fairly experienced Linux user, but it was sure nice to be able to ask questions before I actually committed changes to disk. I hope my tale of woe serves as both warning and encouragement to you; stuff happens, and you can recover from it.
Mike Diehl is a freelance Computer Nerd specializing in Linux administration, programing, and VoIP. Mike lives in Albuquerque, NM. with his wife and 3 sons. He can be reached at firstname.lastname@example.org
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
|Secure Desktops with Qubes: Introduction||May 27, 2016|
|Chris Birchall's Re-Engineering Legacy Software (Manning Publications)||May 26, 2016|
|ServersCheck's Thermal Imaging Camera Sensor||May 25, 2016|
|Petros Koutoupis' RapidDisk||May 24, 2016|
|The Italian Army Switches to LibreOffice||May 23, 2016|
|PeaZip||May 20, 2016|
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Chris Birchall's Re-Engineering Legacy Software (Manning Publications)
- The Italian Army Switches to LibreOffice
- Linux Mint 18
- Petros Koutoupis' RapidDisk
- ServersCheck's Thermal Imaging Camera Sensor
- Oracle vs. Google: Round 2
- The FBI and the Mozilla Foundation Lock Horns over Known Security Hole
- Privacy and the New Math
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide