A Full Root Partition

Recently I needed to solve the problem of an apparently 100%-full root partition on a server that I am responsible for monitoring. This was unexpected and sudden, as the partition's usage has been at a constant 30% for quite some time.

From the experience, I learned a simple but useful fact about the way the Linux kernel deals with open files/processes that I'd like to share in hopes that it will save others some time troubleshooting a similar problem. To summarize what will be explained below: If a process opens a file and the file is subsequently deleted but still held open by the process, the file will continue to exist, physically consuming disk space until the process ends. This consequence of the way the unlink() system call works can be confusing because the df command will consider the space still in use, but du and similar commands will consider it free.

1.) Check mount points and partition free space according to df:

# df -h
   Filesystem            Size  Used Avail Use% Mounted on
   /dev/sda6             373M  373M     0 100% /

...df says I'm out of space on my root partition!

2.) Check disk usage by directory entries under / (if multiple hard links exist to the same file, only count its size once in the total):

$ du -bc 'find / -type f -links 1 -print'

NOTE 1: Don't forget to exclude directories that may be mounted on other partitions, e.g. /home, /var, /usr

NOTE 2: This command should work, but on my system, it complains and fails if has a large number of files. Instead, I used a Perl script that will do the equivalent of the du command above:

 ############################  disk_usage.pl   ############################
   #! /usr/bin/perl

   #=-
   #=- Print total bytes used by files beneath the specified directory.

   #=- Only unique inode numbers are counted, so that multiple hard links
   #=- to the same file only count once toward the total file size for the
 dir. #=-

   use strict;
   use warnings;



   my $verbose = 0;

   # open a pipe to a find command to locate files starting in the specified
   # directory and print a simply-formatted string for each...
   my $find_targets = join ' ', @ARGV;

   open my $found_list, '-|', qq{find $find_targets -type f -printf
 "%s\tinode: %i\tlink count: %n\t%p\n"} or die "Can't open find command";

   # process our simple output, keeping track of inode numbers and file

 sizes... my %h;
   while( my $output_line =  ) {
       my ($size, $inode) = $output_line =~ m/^(\d+)\s+(inode: \d+)/;
       $h{$inode} = $size;
       print $output_line if $verbose;

   }

   # calculate and display total...
   my @ordered = sort values %h;
   my $total = 0;
   map { $total += $_ } @ordered;
   print "-" x 25 . "\nTOTAL SIZE (UNIQUE FILES): " . $total . "\n";

   ##########################################################################

$ sudo disk_usage.pl /
...
TOTAL SIZE (UNIQUE FILES): 33747784

This total, about 34M, was calculated with commands which use directory entries as a basis for disk usage. But why does the df command suggest that no available space is left? If only 34M are used, We should still have around 339M available (plus blocks reserved for the superuser)!

The answer is that a process has opened a file which has subsequently been unlink()ed (e.g. via the rm command). The file's contents will remain until the process yields its link to the file. In my case, an errant process was causing the data to remain on disk. The next step: find files that are being held open by processes but are otherwise unreferenced by directory entries:

   # lsof | grep deleted
   gpm        1081     someuser    1u   REG        8,7         5     38166
 /var/run/gpmcDrT4c (deleted) gnome-ses  1274     someuser    1w   REG       

 8,6 280546304     68281 /home/someuser/.xsession-errors (deleted) gnome-ses 
 1274     someuser    2w   REG        8,6 280546304     68281
 /home/someuser/.xsession-errors (deleted) metacity   1367     someuser    1w

   REG        8,6 280546304     68281 /home/someuser/.xsession-errors
 (deleted) ...additional lines ommitted

Here, some processes central to the X server seemed to be consuming around 280M. It was easy to solve the problem by having 'someuser' log out of X.

This Tech Tip comes from Karl in Kentucky, USA. Thank you, Karl!

Instant fame is easy at Linux Journal. Just send us your useful Tech Tips to share with the Linux Community, and we'll send you a cool t-shirt for your efforts!

Please note: Tech Tips featured in this specific section of LinuxJournal.com are kindly brought to us by readers and are not necessarily tested by LinuxJournal.com editors.