Loading
Home ›
Treating Compressed and Uncompressed Data Sources the Same
Dec 19, 2008 By David Sinck
in
Occasionally, you need to process a number of files—some of which have been compressed and some which have not (think log files). Rather than running two variations, one compressed and one not, wrap it in a bash function:
function data_source ()
{
local F=$1
# strip the gz if it's there
F=$(echo $F | perl -pe 's/.gz$//')
if [[ -f $F ]] ; then
cat $F
elif [[ -f $F.gz ]] ; then
nice gunzip -c $F
fi
}
which nicely allows:
for file in * ; do data_source $file | ... done
Whether you're dealing with gzip'd files or uncompressed, you no longer have to treat them differently mentally. With a little more effort, bzip files also could be detected and handled.
______________________
Trending Topics
| You Need A Budget | Feb 10, 2012 |
| The Linux powered LAN Gaming House | Feb 08, 2012 |
| Creating a vDSO: the Colonel's Other Chicken | Feb 06, 2012 |
| Your CMS Is Not Your Web Site | Feb 01, 2012 |
| Casper, the Friendly (and Persistent) Ghost | Jan 31, 2012 |
| Razor-qt 0.4 - Qt based Desktop Environment | Jan 30, 2012 |
- Fun with ethtool
- Linux-Based X Terminals with XDMCP
- Readers' Choice Awards 2011
- 100% disappointed with the decision to go all digital.
- Parallel Programming with NVIDIA CUDA
- You Need A Budget
- Validate an E-Mail Address with PHP, the Right Way
- The Linux powered LAN Gaming House
- The Linux RAID-1, 4, 5 Code
- Python for Android
- Gnome3 is such a POS. No one
3 hours 30 min ago - Gnome 3 is the biggest POS
3 hours 40 min ago - I didn't knew this thing by
9 hours 45 min ago - Author's reply
13 hours 9 min ago - Link to modlys
14 hours 16 min ago - I use YNAB because of the
14 hours 27 min ago - Search
19 hours 30 min ago - Question
19 hours 53 min ago - for the record
19 hours 56 min ago - That's disappointing. Thanks
22 hours 19 min ago





Comments
Remember the -exec action of find
I don't like using 'for file in *; do ...; done' as it fails on filenames that contain spaces, using the -exec action of find makes avoiding this problem very easy. Also it might be a good idea to at least print the name of each file before outputting the content of each file in the loop.
I'd use the following:
find . -maxdepth 1 -type f -exec sh -c "echo '{}:' ; bzcat '{}' 2>/dev/null || zcat '{}' 2>/dev/null || cat '{}' 2>/dev/null" \;
Or, if you are inclined to use a shell script create "data_source.sh" and 'chmod +x' it:
#!/bin/sh
echo "$1:"
bzcat "$1" 2>/dev/null || zcat "$1" 2>/dev/null || cat "$1" 2>/dev/null
and use the following:
find . -maxdepth 1 -type f -exec ./data_source.sh '{}' \;
I'd prefer using a case
I'd prefer using a case stucture similar to:
function data_source ()
{
local F=$1
case $F in
*.gz) zcat $F;;
*.bz2) bzcat $F;;
*) cat $F;;
}
This only invokes one external command to do the "cat" function. And it is very easy to extend to other possible suffixes that may be special.
Decompression in a pipe
If you're processing files in a pipe, try this:
bzcat -f "${FILE}" | zcat -f | ...It doesn't matter whether ${FILE} is compressed with bzip2, gzip, or even not compressed at all. It just works. (Assuming you have bzcat and zcat installed, of course.)
Chris
I don't see an -f switch for
I don't see an -f switch for bzcat on my system. And I don't understand the piping. Do you mean something like:
bzcat $F || zcat $F || cat $F
?
I don't like the above as it is inefficient. If bzcat fails, then zcat is tried and if it fails, then normal cat is done. Of course, the good part is that this is not dependent on a file suffix.
Another way to decompression in a pipe
Another way if bzcat is not installed...
gzip -dc "${FILE}" | ...
Perl?
Reaching for the Advanced Bash Scripting Guide...
Check out Table B-5 String Operations and
${string%%substring} Strip longest match of $substring from back of $string
Re: Perl?
${string%%substring} Strip longest match of $substring from back of $string
Great tip. Or if you want to kick it old-school UNIX style:
F=`dirname $F`/`basename $F .gz`