Hack and / - A Little Spring Cleaning
The duck command works great to discover how the space is being used in your home directory, but if you are like me, your home directory is actually on a different partition from the root filesystem. If root is filling up, you still can use the duck command (with a slight tweak) to see which directories consume the most space. You need root privileges to scan all the directories in your root filesystem, so use either su or sudo -s (depending on how you get root permissions) before the duck command:
# cd / # du -ckx | sort -n . . . 243920 ./usr/lib/openoffice 277600 ./var/cache/apt 296376 ./var/cache 475144 ./var 952096 ./usr/share 1099264 ./usr/lib 2259332 ./usr 2908804 . 2908804 total
The extra -x argument I added above tells du to stay on one filesystem—in this case, the root filesystem. Otherwise, if you don't specify -x and you have /home or other directories on different filesystems, du will scan through those partitions as well, so you ultimately will have to skip them out as you scan through your results. As you can see from this output, the /usr directory takes up the bulk of the space on my system, with /usr/lib using almost half the space inside /usr. Also note that /var/cache/apt is listed here—more on how to deal with that below.
Now that you know how your storage is being used, here are a few common-sense ways to manage those files and free some space. If you do Linux programming, build software from source or regularly download tarballs, you probably have these tarballs lying around along with their extracted directories. One easy way to free up space is to delete either the tarball or the extracted directory. If you build your own kernels, you probably have a number of old kernel source trees in /usr/src that you won't ever use again and could stand to delete.
Another common space-waster is old ISO files. Do you really still need that Red Had 7.2 ISO? If so, burn an archive copy or two to CD and then delete the image. Along those same lines, audio files always end up with either an extra copy in a directory for a mix CD, or if you play with video conversion tools like me, you have video files in different phases of being transcoded. If you are done with a project, why not delete them and save the space?
On desktops, but especially on servers, one of the most common places you will find wasted space is in log directories. Logs definitely can be useful, but some logs and some levels of debugging are useful only immediately after a bug is found; the rest of the time they can be truncated or archived safely. Take a look in /var/log/, and see how many large uncompressed log files you have. If the file is no longer being used, you should gzip it. You would be amazed how far you can compress incredibly large log files if you haven't tried it before. If you aren't sure whether a log file is still being written to, use lsof to check:
# lsof | grep "/path/to/filename"
If you regularly find yourself cleaning up or gzipping the rotated log files in /var/log (they append .0, .1 and so on as they are being rotated), then edit /etc/logrotate.conf and enable compression. Usually, this simply requires finding the commented line labeled #compress and uncommenting it.
Another great place to free up space is in your package manager's local package cache. For instance, in the case of Debian-based systems, the packages apt downloads are cached in /var/cache/apt/archives. You could go to that directory and remove the files manually, or you simply could become root and type:
# apt-get autoclean
to remove all the cached packages you no longer need. If you have a distribution that uses yum, the following two commands will clear out the cached headers and packages from your system:
# yum clean headers # yum clean packages
Finally, archiving can be a good solution when cleaning your storage space. If you have a local file server or one machine with more storage than the rest, why not make sure all your large files exist only there and then access them over the network? Alternatively, burn large files you want to keep but don't immediately need to CD or DVD. Once you are done, you'll have plenty of newly freed space—hopefully, enough to last you until next spring.
Kyle Rankin is a Senior Systems Administrator in the San Francisco Bay Area and the author of a number of books, including Knoppix Hacks and Ubuntu Hacks for O'Reilly Media. He is currently the president of the North Bay Linux Users' Group.
Kyle Rankin is a systems architect; and the author of DevOps Troubleshooting, The Official Ubuntu Server Book, Knoppix Hacks, Knoppix Pocket Reference, Linux Multimedia Hacks, and Ubuntu Hacks.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Validate an E-Mail Address with PHP, the Right Way
- Technical Support Rep
- Senior Perl Developer
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?