Tech Tip: Find Directories Over a Certain Size

 in

It's fairly simple to find large files on your system using commands such as find, but if you're looking for directories over a certain size find won't help you. The Perl script presented here can help you track down those explosively large directories.

The script prints those directories under a given path whose size is above a certain threshold. It also allows you to exclude directories that match a certain pattern from consideration. The command accepts the following options:

  • -d - Specifies the base directory to search.
  • -t - Specifies the threshold in megabytes (eg 100 == 100MB).
  • -x - Specifies the patterns to ignore (glob patterns).

The following examples show how it can be used:

$ ./file.pl -d ../../ -t 100 -x '{pr*,jd*,tp*,sim*}'
165,/export/home/fengd/CMS/apache-tomcat-6.0.13/logs
274,/export/home/fengd/CMS/apache-tomcat-6.0.13
318,/export/home/fengd/CMS
400,/export/home/fengd/apache-tomcat-6.0.13/bin
417,/export/home/fengd/apache-tomcat-6.0.13
909,/export/home/fengd
909,total
$ ./file.pl -d ../../ -t 100 -x simulator*
178,/export/home/fengd/CMS/apache-tomcat-6.0.13/logs
289,/export/home/fengd/CMS/apache-tomcat-6.0.13
333,/export/home/fengd/CMS
400,/export/home/fengd/apache-tomcat-6.0.13/bin
422,/export/home/fengd/apache-tomcat-6.0.13
757,/export/home/fengd/project/cpp/fileTrans
766,/export/home/fengd/project/cpp
334,/export/home/fengd/project/log/tmp
492,/export/home/fengd/project/log
391,/export/home/fengd/project/store/array
391,/export/home/fengd/project/store
1755,/export/home/fengd/project
133,/export/home/fengd/tptp/config
200,/export/home/fengd/tptp
105,/export/home/fengd/jdk
2994,/export/home/fengd
2994,total

The source code for the command follows:

#!/usr/bin/perl -w
use Getopt::Std;
use Cwd 'abs_path';
my %dir;
getopt("dtx",\%dir);

if(!defined $dir{d}){
	print "Usage: program -d dir [-t threshhold] [-x exclude pattern]\n";
	exit 1;
}

if(!defined $dir{t}){
	$dir{t}=1000;
}

my $f=abs_path($dir{d});
my $cmd="du -m -c $f";

if(defined $dir{x}){
      $cmd=$cmd." --exclude=$dir{x}";
}

my $line=`$cmd`;
while($line=~/(\d+)\s+([^\r\n]+)\r?\n/g){
	if($1>$dir{t}){
		print $1.",".$2."\n";
	}
}

The script uses the du command to get size information. The exclude patterns are passed directly to the du command. It then processes the output from du and prints out those directories that are greater than the threshold.

AttachmentSize
file_pl.txt478 bytes
______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

strange, the find combined with -type and -size doesn't work

Anonymous's picture

"find . -type d -regex .*h" works,
"find . -type d -regex .*h -size 10" doesn't. I tried on fc8 and fc11. I don't know why.

another one liner...

Terry's picture

my incantation:

find . -type d -exec du -sk {} \; | sort -nr | less

du is recursive

Caleb Cushing (xenoterracide)'s picture

du is recursive by itself so just run du -h (for human readable). I also like du -sh * and then to delve down to the directory that's huge... but you can script all kinds of things just with du.

simplest

Caleb Cushing (xenoterracide)'s picture

/bin/du |awk '$1 > 1000'

Beats me

Xebeche's picture

I didn't know about --exclude for du. All together,

du --exclude=... | awk '$1 > threshold'

is a great tip. Thanks to all!

find . -type d -exec du -s

Anonymous's picture

find . -type d -exec du -s {} \; awk '{if ($1 > sumnum) print $0;}'

Just plug in a numeric value for sumnum

Oops - left out a pipe: find

Anonymous's picture

Oops - left out a pipe:

find . -type d -exec du -s {} \; | awk '{if ($1 > sumnum) print $0;}'

Don't Abandon the Unix Way!

Bob Hope's picture

#CAN TOO!
find -type d | xargs du -s

#Sort by size
find -type d | xargs du -s | sort -n

#Sort by dir name
find -type d | sort | xargs du -s

#If some knucklehead put spaces in the dir names
find -type d -print0 | xargs -0 du -s

If you're going to stand up for the Unix way ...

Jack Repenning's picture

find -type d | xargs du -s

... is redundant, simply equivalent to du without the -s, since du both knows how to recurse, and prefers to speak only of the directories.

And the version with find and xargs is markedly inefficient, because find and du will be harassing different directories at any point in time, potentially leading to dis[ck] thrashing (if your need is large enough to exhaust the dis[ck] cache).

I was just about to post

metalx2000's picture

I was just about to post something similar.
Thanks

http://filmsbykris.com/
Everything you ever need to know about Open-Source Software.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState