Hack and / - When Disaster Strikes: Hard Drive Crashes

All is not necessarily lost when your hard drive starts the click of death. Learn how to create a rescue image of a failing drive while it still has some life left in it.

The following is the beginning of a series of columns on Linux disasters and how to recover from them, inspired in part by a Halloween Linux Journal Live episode titled “Horror Stories”. You can watch the original episode at www.linuxjournal.com/video/linux-journal-live-horror-stories.

Nothing teaches you about Linux like a good disaster. Whether it's a hard drive crash, a wayward rm -rf command or fdisk mistakes, there are any number of ways your normal day as a Linux user can turn into a nightmare. Now, with that nightmare comes great opportunity: I've learned more about how Linux works by accidentally breaking it and then having to fix it again, than I ever have learned when everything was running smoothly. Believe me when I say that the following series of articles on system recovery is hard-earned knowledge.

Treated well, computer equipment is pretty reliable. Although I've experienced failures in just about every major computer part over the years, the fact is, I've had more computers outlast their usefulness than not. That being said, there's one computer component you can almost count on to fail at some point—the hard drive. You can blame it on the fast-moving parts, the vibration and heat inside a computer system or even a mistake on a forklift at the factory, but when your hard drive fails prematurely, no five-year warranty is going to make you feel better about all that lost data you forgot to back up.

The most important thing you can do to protect yourself from a hard drive crash (or really most Linux disasters) is back up your data. Back up your data! Not even a good RAID system can protect you from all hard drive failures (plus RAID doesn't protect you if you delete a file accidentally), so if the data is important, be sure to back it up. Testing your backups is just as important as backing up in the first place. You have not truly backed up anything if you haven't tested restoring the backup. The methods I list below for recovering data from a crashed hard drive are much more time consuming than restoring from a backup, so if at all possible, back up your data.

Now that I'm done with my lecture, let's assume that for some reason, one of your hard drives crashed and you did not have a backup. All is not necessarily lost. There are many different kinds of hard drive failure. Now, in a true hard drive crash, the head of the hard drive actually will crash into the platter as it spins at high speed. I've seen platters after a head crash that are translucent in sections as the head scraped off all of the magnetic coating. If this has happened to you, no command I list here will help you. Your only recourse will be one of the forensics firms out there that specialize in hard drive recovery. When most people say their hard drive has crashed, they are talking about a less extreme failure. Often, what has happened is that the hard drive has developed a number of bad blocks—so many that you cannot mount the filesystem—or in other cases, there is some different failure that results in I/O errors when you try to read from the hard drive. In many of these circumstances, you can recover at least some, if not most, of the data. I've been able to recover data from drives that sounded horrible and other people had completely written off, and it took only a few commands and a little patience.

Create a Recovery Image

Hard drive recovery works on the assumption that not all of the data on the drive is bad. Generally speaking, if you have bad blocks on a hard drive, they often are clustered together. The rest of the data on the drive could be fine if you could only access it. When hard drives start to die, they often do it in phases, so you want to recover as much data as quickly as possible. If a hard drive has I/O errors, you sometimes can damage the data further if you run filesystem checks or other repairs on the device itself. Instead, what you want to do is create a complete image of the drive, stored on good media, and then work with that image.

A number of imaging tools are available for Linux—from the classic dd program to advanced GUI tools—but the problem with most of them is that they are designed to image healthy drives. The problem with unhealthy drives is that when you attempt to read from a bad block, you will get an I/O error, and most standard imaging tools will fail in some way when they get an error. Although you can tell dd to ignore errors, it happily will skip to the next block and write nothing for the block it can't read, so you can end up with an image that's smaller than your drive. When you image an unhealthy drive, you want a tool designed for the job. For Linux, that tool is ddrescue.

______________________

Kyle Rankin is a director of engineering operations in the San Francisco Bay Area, the author of a number of books including DevOps Troubleshooting and The Official Ubuntu Server Book, and is a columnist for Linux Journal.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Partially unrecoverable allocation table

lopo's picture

Tried it all but my rescued image does not show any file under /home/lopo although they are there. On an 92GB partition on less than 700KB are damaged but it was enough to lost everything ;(

Foremost does not reconigzed a lot of my files: bz2, gz, svg, odf files, etc., so -t all is not really ALL.

dd progress report with USR1 signal

RALi's picture

While I have no quibble with using ddrescue instead of dd (the wonderful thing about *nix is that there are 72 different ways to do anything) I do have to comment on your statement: "In fact, in some circumstances, I prefer using ddrescue over dd for regular imaging as well, just for the progress output."

dd provides progress info if you send a USR1 to the process. From the dd(1) man page:

Sending a USR1 signal to a running `dd' process makes it
print I/O statistics to standard error and then resume copying.

I guess I'm lucky

goblin's picture

I guess I'm lucky then.
I've never had a hard drive fail on me. Ever. Not at work, not at home.
I still keep my first PC around, with its fully functional hard drive in it (a 80486 80MHz with a 528 MB hard drive, from 1994 I think).

A Better 'Method of Last Resort'

mmueller's picture

If you can't mount a ddrescue image, but need to retrieve documents, photographs, pdf files, ect. you can use a nifty program called "foremost." It is available for most *nix platforms, and on Windoze via cygwin. Foremost scans through a hard drive image, mountable or not, and looks for recognizable file headers. It understands over twenty popular file headers including jpg, pdf, doc, xml, etc. When it finds these files it dumps them out as usable files. It is truly a thing of beauty. The first time you use it, you will just sit back in amazement. For example, if you have a hard drive image named my_hd_image.dd, that was made with one of the dd utilities, you could execute the following command:

~$ foremost -t all -i my_hd_image.dd

After the command executes, a subdirectory will be created that has all of the recovered files, organized neatly by file type. On Ubuntu, you can get foremost by typing:

~$ sudo apt-get install foremost

This tools is also excellent for recovering deleted files from USB drives, etc. Enjoy!

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState