Tech Tip: Find Directories Over a Certain Size

It's fairly simple to find large files on your system using commands such as find, but if you're looking for directories over a certain size find won't help you. The Perl script presented here can help you track down those explosively large directories.

The script prints those directories under a given path whose size is above a certain threshold. It also allows you to exclude directories that match a certain pattern from consideration. The command accepts the following options:

  • -d - Specifies the base directory to search.
  • -t - Specifies the threshold in megabytes (eg 100 == 100MB).
  • -x - Specifies the patterns to ignore (glob patterns).

The following examples show how it can be used:

$ ./file.pl -d ../../ -t 100 -x '{pr*,jd*,tp*,sim*}'
165,/export/home/fengd/CMS/apache-tomcat-6.0.13/logs
274,/export/home/fengd/CMS/apache-tomcat-6.0.13
318,/export/home/fengd/CMS
400,/export/home/fengd/apache-tomcat-6.0.13/bin
417,/export/home/fengd/apache-tomcat-6.0.13
909,/export/home/fengd
909,total
$ ./file.pl -d ../../ -t 100 -x simulator*
178,/export/home/fengd/CMS/apache-tomcat-6.0.13/logs
289,/export/home/fengd/CMS/apache-tomcat-6.0.13
333,/export/home/fengd/CMS
400,/export/home/fengd/apache-tomcat-6.0.13/bin
422,/export/home/fengd/apache-tomcat-6.0.13
757,/export/home/fengd/project/cpp/fileTrans
766,/export/home/fengd/project/cpp
334,/export/home/fengd/project/log/tmp
492,/export/home/fengd/project/log
391,/export/home/fengd/project/store/array
391,/export/home/fengd/project/store
1755,/export/home/fengd/project
133,/export/home/fengd/tptp/config
200,/export/home/fengd/tptp
105,/export/home/fengd/jdk
2994,/export/home/fengd
2994,total

The source code for the command follows:

#!/usr/bin/perl -w
use Getopt::Std;
use Cwd 'abs_path';
my %dir;
getopt("dtx",\%dir);

if(!defined $dir{d}){
	print "Usage: program -d dir [-t threshhold] [-x exclude pattern]\n";
	exit 1;
}

if(!defined $dir{t}){
	$dir{t}=1000;
}

my $f=abs_path($dir{d});
my $cmd="du -m -c $f";

if(defined $dir{x}){
      $cmd=$cmd." --exclude=$dir{x}";
}

my $line=`$cmd`;
while($line=~/(\d+)\s+([^\r\n]+)\r?\n/g){
	if($1>$dir{t}){
		print $1.",".$2."\n";
	}
}

The script uses the du command to get size information. The exclude patterns are passed directly to the du command. It then processes the output from du and prints out those directories that are greater than the threshold.

Load Disqus comments