Disk Maintenance under Linux (Disk Recovery)
Invoking fsck from the command line on any given partition will probably not result in a check being run, because you have not reached the predetermined maximum mount count; therefore, the system believes the file system is clean and not in need of checking. To force the check, invoke fsck with -f.
At this point, one of two things will happen: fsck will begin to run correctly and check your disk partition (possibly hesitating at the bad spots on the disk and issuing appropriate error messages before continuing) or it will terminate without running, leaving error messages behind. If fsck does not run, you'll have to give the program additional information as indicated in the error messages. Probably the most common information you'll need to pass to e2fsck is the address of the alternate superblock or the block size so that e2fsck can calculate where an alternate superblock is located. The -b switch will tell e2fsck to use the alternate superblock, but we'll have to tell e2fsck where to find one. On ext2 file systems, superblocks are normally located at 8193, 16385 and higher multiples of 8192+1 (see dumpe2fs explanation below). As an alternative, we can pass e2fsck the block size with the -B switch (once we have that information) to allow e2fsck to calculate alternate superblock locations. Later I'll tell you where to get the block size value if you ever need it.
At this point, it's worth mentioning two other mutually exclusive switches available to fsck and e2fsck. The first is the -n switch, which tells fsck to answer no to all queries, and will leave the file system in its original condition making no repairs. The second is the -y switch, which automatically corrects any errors it finds. Generally, to speed things up, you may want to run fsck with the -y switch. So, why don't we just use this option all the time? I strongly recommend against this course of action, if you suspect problems with the file system. While fsck will usually not encounter problems, typing fsck -y and then taking a coffee break, leaving the machine to take care of itself, is not particularly prudent. If, in the interests of speed, you use the automatic answer yes switch to do routine checks, be sure to list your lost+found directories from time to time. Besides, you'll really want to note the block or inode numbers that appear while fsck runs, so that you can check them later to see if they are allocated to files.
The other available options for fsck and e2fsck can be found in the man pages. I consider the fsck and e2fsck man pages fairly well written, as is appropriate considering the importance of these utilities to your file system's health.
You may encounter messages asking if you want fsck to correct an error. Answering no will normally terminate the program so that you may fix the problem and rerun fsck. However, most error messages you're likely to encounter are fairly routine, and you may safely answer yes to them. If you see a message such as inode 1234 unattached, it means the file pointed to by inode (information node) 1234 has, for one reason or another, lost its filename. This can occur for several reasons, including a power failure or a computer reset without a proper disk sync.
Other common errors include zero time inodes, which are also due to the disk not being properly synced before shutdown. If you see these errors frequently and you've been shutting down your system correctly, you may have any number of other problems. In this case, you could begin by checking your power and data connections and your power supply for fluctuations or passing too much noise. Finally, check your hard disk parameters. I must caution you that altering the default hard disk parameters could do serious damage to your file system or corrupt your files—be careful.
One lost+found directory should be located in the root partition of each file system. If you have, for example, two mounted file systems, /usr and /home, you should have three lost+found directories. These directories will contain files whose inodes have become disconnected from their file names. The files in these directories will have the form ./#nnnn, where nnnn is the inode number used as the file name. You may be able to determine what the file is by inspecting it using cat. If cat returns what appears to be garbage, you probably have a binary file. In this case, you can do a chmod +x #nnnn, and then run the file. These procedures should give you enough information to learn what the file is. If the file is important, it can be renamed and moved to its original location; otherwise, it can be deleted.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- Be a Mechanic...with Android and Linux!
- New Products
- Users, Permissions and Multitenant Sites
- Flexible Access Control with Squid Proxy
- Security in Three Ds: Detect, Decide and Deny
- High-Availability Storage with HA-LVM
- Tighten Up SSH
- DevOps: Everything You Need to Know
- Solving ODEs on Linux
- Non-Linux FOSS: MenuMeters