Tech Tip: Find Directories Over a Certain Size
It's fairly simple to find large files on your system using commands such as find, but if you're looking for directories over a certain size find won't help you. The Perl script presented here can help you track down those explosively large directories.
The script prints those directories under a given path whose size is above a certain threshold. It also allows you to exclude directories that match a certain pattern from consideration. The command accepts the following options:
- -d - Specifies the base directory to search.
- -t - Specifies the threshold in megabytes (eg 100 == 100MB).
- -x - Specifies the patterns to ignore (glob patterns).
The following examples show how it can be used:
$ ./file.pl -d ../../ -t 100 -x '{pr*,jd*,tp*,sim*}'
165,/export/home/fengd/CMS/apache-tomcat-6.0.13/logs
274,/export/home/fengd/CMS/apache-tomcat-6.0.13
318,/export/home/fengd/CMS
400,/export/home/fengd/apache-tomcat-6.0.13/bin
417,/export/home/fengd/apache-tomcat-6.0.13
909,/export/home/fengd
909,total
$ ./file.pl -d ../../ -t 100 -x simulator*
178,/export/home/fengd/CMS/apache-tomcat-6.0.13/logs
289,/export/home/fengd/CMS/apache-tomcat-6.0.13
333,/export/home/fengd/CMS
400,/export/home/fengd/apache-tomcat-6.0.13/bin
422,/export/home/fengd/apache-tomcat-6.0.13
757,/export/home/fengd/project/cpp/fileTrans
766,/export/home/fengd/project/cpp
334,/export/home/fengd/project/log/tmp
492,/export/home/fengd/project/log
391,/export/home/fengd/project/store/array
391,/export/home/fengd/project/store
1755,/export/home/fengd/project
133,/export/home/fengd/tptp/config
200,/export/home/fengd/tptp
105,/export/home/fengd/jdk
2994,/export/home/fengd
2994,total
The source code for the command follows:
#!/usr/bin/perl -w
use Getopt::Std;
use Cwd 'abs_path';
my %dir;
getopt("dtx",\%dir);
if(!defined $dir{d}){
print "Usage: program -d dir [-t threshhold] [-x exclude pattern]\n";
exit 1;
}
if(!defined $dir{t}){
$dir{t}=1000;
}
my $f=abs_path($dir{d});
my $cmd="du -m -c $f";
if(defined $dir{x}){
$cmd=$cmd." --exclude=$dir{x}";
}
my $line=`$cmd`;
while($line=~/(\d+)\s+([^\r\n]+)\r?\n/g){
if($1>$dir{t}){
print $1.",".$2."\n";
}
}
The script uses the du command to get size information. The exclude patterns are passed directly to the du command. It then processes the output from du and prints out those directories that are greater than the threshold.
| Attachment | Size |
|---|---|
| file_pl.txt | 478 bytes |
Trending Topics
| OpenLDAP Everywhere Reloaded, Part I | May 23, 2012 |
| Chemistry the Gromacs Way | May 21, 2012 |
| Make TV Awesome with Bluecop | May 16, 2012 |
| Hack and / - Password Cracking with GPUs, Part I: the Setup | May 15, 2012 |
| An Introduction to Application Development with Catalyst and Perl | May 14, 2012 |
| Cryptocurrency: Your Total Cost Is 01001010010 | May 09, 2012 |
- A Statistical Approach to the Spam Problem
- Validate an E-Mail Address with PHP, the Right Way
- OpenLDAP Everywhere Reloaded, Part I
- Strip DRM from WMV File
- Linux--The Internet Appliance?
- Eagles BBS
- The Linux Signals Handling Model
- Boot with GRUB
- Streaming MPEG-4 with Linux
- Chapter 16: Ubuntu and Your iPod
- Editorial Standards?
3 hours 58 min ago - Great one
5 hours 33 min ago - Common form in many
5 hours 54 min ago - Awsome
10 hours 57 min ago - Euro 2012 Coupon Codes - Get 20% Off Pavtube TiVo Converter
3 days 9 hours ago - Euro 2012 Big Sale: 20% Off Instant Savings on TiVo Converter
3 days 9 hours ago - MakeMKV works as well, though
3 days 9 hours ago - Euro 2012 Big Sale: 20% Off Instant Savings on TiVo Converter
3 days 10 hours ago - Awesome
4 days 8 hours ago - Who worries approx the
4 days 10 hours ago





Comments
strange, the find combined with -type and -size doesn't work
"find . -type d -regex .*h" works,
"find . -type d -regex .*h -size 10" doesn't. I tried on fc8 and fc11. I don't know why.
another one liner...
my incantation:
find . -type d -exec du -sk {} \; | sort -nr | less
du is recursive
du is recursive by itself so just run du -h (for human readable). I also like du -sh * and then to delve down to the directory that's huge... but you can script all kinds of things just with du.
simplest
/bin/du |awk '$1 > 1000'
Beats me
I didn't know about --exclude for du. All together,
is a great tip. Thanks to all!
find . -type d -exec du -s
find . -type d -exec du -s {} \; awk '{if ($1 > sumnum) print $0;}'
Just plug in a numeric value for sumnum
Oops - left out a pipe: find
Oops - left out a pipe:
find . -type d -exec du -s {} \; | awk '{if ($1 > sumnum) print $0;}'
Don't Abandon the Unix Way!
#CAN TOO!
find -type d | xargs du -s
#Sort by size
find -type d | xargs du -s | sort -n
#Sort by dir name
find -type d | sort | xargs du -s
#If some knucklehead put spaces in the dir names
find -type d -print0 | xargs -0 du -s
If you're going to stand up for the Unix way ...
find -type d | xargs du -s
... is redundant, simply equivalent to du without the -s, since du both knows how to recurse, and prefers to speak only of the directories.
And the version with find and xargs is markedly inefficient, because find and du will be harassing different directories at any point in time, potentially leading to dis[ck] thrashing (if your need is large enough to exhaust the dis[ck] cache).
I was just about to post
I was just about to post something similar.
Thanks
http://filmsbykris.com/
Everything you ever need to know about Open-Source Software.