Hack and / - A Little Spring Cleaning

HOWTOs

by Kyle Rankin

on February 1, 2008

No matter how big your hard drives are, at some point you're going to look at your storage and wonder where all the space went. Your /home directory is probably a good example. If you are like me, you don't always clean up after yourself or organize immediately after you download a file. Sure, I have directories for organizing my ISOs, my documents and my videos, but more often than not, my home directory becomes the digital equivalent of a junk drawer with a few tarballs here, an old distribution ISO there and PDF specs for hardware I no longer own. Although some of these files don't really take up space on the disk—it's more a matter of clutter—when I'm running out of storage, I'd like to find the files that take up the most space and decide how to deal with them quickly. This month, I introduce some of my favorite commands for locating space-wasting files on my system and follow up with common ways to clear some space.

Think Locally

First, let's start with file clutter in your main home directory. Although all major GUI file managers these days make it easy to sort a directory by size, because I'm focusing on command-line tips, let's cover how to find the largest files in the current directory via the old standby, ls. If you type:

$ ls -lSh

you'll get a list of all the files in your current directory sorted by size. Of course, if you have a lot of files in the directory, the files you most want to see are probably somewhere along the top of the list, so I typically like to type:

$ ls -lSh | less

to view the list slowly starting at the top. Or, if I'm in a hurry, I type:

$ ls -lSh | head

to see only the top ten largest files. Now, this is pretty basic, but it's worth reviewing, as you'll use these commands over and over again to track down space-wasting files. Depending on how you structure your home directory, you probably won't find all the large files together. It's more likely that they are scattered into different subdirectories, so you then need to scan through your directory structure recursively, tally up the disk space used in each directory, and sort the output. Luckily, you don't have to resort to ls for this; du does the job quite nicely. For instance, one common use for du that I see referenced a lot is the following:

$ du -sh *

This scans through all the subdirectories you list as arguments (in this case, all the subdirectories within my current directory) and then lists them one by one with human-readable file sizes (the -h option converts the file sizes into megabytes, gigabytes and so forth, so it's easier to read). Here's some example output from that command:

456K    bin
28K     Default-Compiz
16K     hl4070cdwcups-1.0.0-7.i386.deb
344K    hl4070cdwlpr-1.0.0-7.i386.deb
27M     images
60K     LexmarkC750.ppd
850M    mail

Although you certainly could work with this information, it would be much easier if it were sorted. To do that, replace the -h argument with -k, and then pipe the output to sort:

$ du -sk * | sort -n

16      hl4070cdwcups-1.0.0-7.i386.deb
28      Default-Compiz
60      LexmarkC750.ppd
344     hl4070cdwlpr-1.0.0-7.i386.deb
456     bin
10224   writing
26948   images
869588  mail

This works better, because now I can see that my local e-mail cache is taking up the bulk of the storage; however, next I would need to change to the mail directory and run the command again, over and over, until I narrow it down to the subdirectory that has the large files. That's why I normally skip the above commands and go straight for what I affectionately call the duck command:

$ du -ck | sort -n
. . .
87704   ./.mozilla
87704   ./.mozilla/firefox
119236  ./mail/example.net/sent-mail-2004
119236  ./mail/example.net/sent-mail-2004/cur
869852  ./mail
869852  ./mail/example.net
1064100 .
1064100 total

The -c option essentially recurses into each subdirectory like before, except it keeps a running tally of the space used by each subdirectory down the tree, not just the first level of directories. When it reports its findings, it might list the same top-level directory multiple times. This makes it easy to drill down to the actual directory that consumes the most space, which in this example seems to be ./mail/example.net/sent-mail-2004/cur. If I wanted to clean up files there, I could cd to that directory and then run the ls commands I used above to see which files used the most space.

Act Globally

The duck command works great to discover how the space is being used in your home directory, but if you are like me, your home directory is actually on a different partition from the root filesystem. If root is filling up, you still can use the duck command (with a slight tweak) to see which directories consume the most space. You need root privileges to scan all the directories in your root filesystem, so use either su or sudo -s (depending on how you get root permissions) before the duck command:

# cd /
# du -ckx | sort -n
. . .
243920  ./usr/lib/openoffice
277600  ./var/cache/apt
296376  ./var/cache
475144  ./var
952096  ./usr/share
1099264 ./usr/lib
2259332 ./usr
2908804 .
2908804 total

The extra -x argument I added above tells du to stay on one filesystem—in this case, the root filesystem. Otherwise, if you don't specify -x and you have /home or other directories on different filesystems, du will scan through those partitions as well, so you ultimately will have to skip them out as you scan through your results. As you can see from this output, the /usr directory takes up the bulk of the space on my system, with /usr/lib using almost half the space inside /usr. Also note that /var/cache/apt is listed here—more on how to deal with that below.

Free as in Space

Now that you know how your storage is being used, here are a few common-sense ways to manage those files and free some space. If you do Linux programming, build software from source or regularly download tarballs, you probably have these tarballs lying around along with their extracted directories. One easy way to free up space is to delete either the tarball or the extracted directory. If you build your own kernels, you probably have a number of old kernel source trees in /usr/src that you won't ever use again and could stand to delete.

Another common space-waster is old ISO files. Do you really still need that Red Had 7.2 ISO? If so, burn an archive copy or two to CD and then delete the image. Along those same lines, audio files always end up with either an extra copy in a directory for a mix CD, or if you play with video conversion tools like me, you have video files in different phases of being transcoded. If you are done with a project, why not delete them and save the space?

On desktops, but especially on servers, one of the most common places you will find wasted space is in log directories. Logs definitely can be useful, but some logs and some levels of debugging are useful only immediately after a bug is found; the rest of the time they can be truncated or archived safely. Take a look in /var/log/, and see how many large uncompressed log files you have. If the file is no longer being used, you should gzip it. You would be amazed how far you can compress incredibly large log files if you haven't tried it before. If you aren't sure whether a log file is still being written to, use lsof to check:

# lsof | grep "/path/to/filename"

If you regularly find yourself cleaning up or gzipping the rotated log files in /var/log (they append .0, .1 and so on as they are being rotated), then edit /etc/logrotate.conf and enable compression. Usually, this simply requires finding the commented line labeled #compress and uncommenting it.

Another great place to free up space is in your package manager's local package cache. For instance, in the case of Debian-based systems, the packages apt downloads are cached in /var/cache/apt/archives. You could go to that directory and remove the files manually, or you simply could become root and type:

# apt-get autoclean

to remove all the cached packages you no longer need. If you have a distribution that uses yum, the following two commands will clear out the cached headers and packages from your system:

# yum clean headers
# yum clean packages

Finally, archiving can be a good solution when cleaning your storage space. If you have a local file server or one machine with more storage than the rest, why not make sure all your large files exist only there and then access them over the network? Alternatively, burn large files you want to keep but don't immediately need to CD or DVD. Once you are done, you'll have plenty of newly freed space—hopefully, enough to last you until next spring.

Kyle Rankin is a Senior Systems Administrator in the San Francisco Bay Area and the author of a number of books, including Knoppix Hacks and Ubuntu Hacks for O'Reilly Media. He is currently the president of the North Bay Linux Users' Group.

Load Disqus comments