Treating Compressed and Uncompressed Data Sources the Same
December 19th, 2008 by David Sinck in
Occasionally, you need to process a number of files—some of which have been compressed and some which have not (think log files). Rather than running two variations, one compressed and one not, wrap it in a bash function:
function data_source ()
{
local F=$1
# strip the gz if it's there
F=$(echo $F | perl -pe 's/.gz$//')
if [[ -f $F ]] ; then
cat $F
elif [[ -f $F.gz ]] ; then
nice gunzip -c $F
fi
}
which nicely allows:
for file in * ; do data_source $file | ... done
Whether you're dealing with gzip'd files or uncompressed, you no longer have to treat them differently mentally. With a little more effort, bzip files also could be detected and handled.
__________________________
Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer
Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.
Subscribe now!
The Latest
Newsletter
Tech Tip Videos
- Nov-04-09
- Oct-29-09
- Oct-26-09
Recently Popular
From the Magazine
December 2009, #188
If last month's Infrastrucuture issue was too "big" for you then try on this month's Embedded issue. Find out how to use Player for programming mobile robots, build a humidity controller for your root cellar, find out how to reduce the boot time of your embedded system, and if you're new to embedded systems find out the basics that go into one. You can also read about the Beagle Board, the Mesh Potato and a spate of other interestingly named items. And along with our regular columns don't miss our new monthly column: Economy Size Geek.
Delicious
Digg
StumbleUpon
Reddit
Facebook








Remember the -exec action of find
On May 4th, 2009 Martijn Ras says:
I don't like using 'for file in *; do ...; done' as it fails on filenames that contain spaces, using the -exec action of find makes avoiding this problem very easy. Also it might be a good idea to at least print the name of each file before outputting the content of each file in the loop.
I'd use the following:
find . -maxdepth 1 -type f -exec sh -c "echo '{}:' ; bzcat '{}' 2>/dev/null || zcat '{}' 2>/dev/null || cat '{}' 2>/dev/null" \;
Or, if you are inclined to use a shell script create "data_source.sh" and 'chmod +x' it:
#!/bin/sh
echo "$1:"
bzcat "$1" 2>/dev/null || zcat "$1" 2>/dev/null || cat "$1" 2>/dev/null
and use the following:
find . -maxdepth 1 -type f -exec ./data_source.sh '{}' \;
I'd prefer using a case
On December 30th, 2008 john.mckown says:
I'd prefer using a case stucture similar to:
function data_source ()
{
local F=$1
case $F in
*.gz) zcat $F;;
*.bz2) bzcat $F;;
*) cat $F;;
}
This only invokes one external command to do the "cat" function. And it is very easy to extend to other possible suffixes that may be special.
Decompression in a pipe
On December 19th, 2008 roaima says:
If you're processing files in a pipe, try this:
bzcat -f "${FILE}" | zcat -f | ...It doesn't matter whether ${FILE} is compressed with bzip2, gzip, or even not compressed at all. It just works. (Assuming you have bzcat and zcat installed, of course.)
Chris
I don't see an -f switch for
On December 30th, 2008 john.mckown says:
I don't see an -f switch for bzcat on my system. And I don't understand the piping. Do you mean something like:
bzcat $F || zcat $F || cat $F
?
I don't like the above as it is inefficient. If bzcat fails, then zcat is tried and if it fails, then normal cat is done. Of course, the good part is that this is not dependent on a file suffix.
Another way to decompression in a pipe
On December 25th, 2008 Anonymous (not verified) says:
Another way if bzcat is not installed...
gzip -dc "${FILE}" | ...
Perl?
On December 19th, 2008 Ian (not verified) says:
Reaching for the Advanced Bash Scripting Guide...
Check out Table B-5 String Operations and
${string%%substring} Strip longest match of $substring from back of $string
Re: Perl?
On December 20th, 2008 Vance (not verified) says:
${string%%substring} Strip longest match of $substring from back of $string
Great tip. Or if you want to kick it old-school UNIX style:
F=`dirname $F`/`basename $F .gz`Post new comment