Treating Compressed and Uncompressed Data Sources the Same

Occasionally, you need to process a number of files—some of which have been compressed and some which have not (think log files). Rather than running two variations, one compressed and one not, wrap it in a bash function:

function data_source ()
{
 local F=$1

 # strip the gz if it's there
 F=$(echo $F | perl -pe 's/.gz$//')

 if [[ -f $F ]] ; then
   cat $F
 elif [[ -f $F.gz ]] ; then
   nice gunzip -c $F
 fi
}

which nicely allows:

for file in * ; do
 data_source $file | ...
done

Whether you're dealing with gzip'd files or uncompressed, you no longer have to treat them differently mentally. With a little more effort, bzip files also could be detected and handled.

Load Disqus comments