This article is about filtering, a very powerful facility available to every Linux user, but one which migrants from other operating systems may find new and unusual.
Pipes: When One Filter Isn't Enough

The basic principle of the pipe (|) is that it allows us to connect the standard output of one program with the standard input of another. (See “Introduction to Named Pipes” by Andy Vaught, September 1997.) A moment's thought should make the usefulness of this when combined with filters quite obvious. We can build complex instructions 'programs', on the command line or in a shell script, simply by stringing filters together.

The filter wc (word count) puts its output in four columns by default. Instead of specifying the -c switch to count only characters, give this command:

wc lj.filters | awk ' { print $3 } '

This takes the output of wc:

258    1558    8921 lj.filters
and filters it to print only the third column, the character count, to the screen:
If you want to print the whole input line, use $0 instead of $3.

Another handy filtering pipe is one that does a simple filtering of ls -a output in order to see only the hidden files:

ls -a| grep ^[.].*

Of course, pipes greatly increase the power of programmable filters such as sed and awk.

Data stored in simple ASCII tables can be manipulated by AWK. As a simple example, consider the weights and measures converter shown in Listing 2. We have a simple text file of conversions:

From    To      Rate---     ---     ----
kg      lb      2.20
lb      kg      0.4536
st      lb      14
lb      st      0.07
kg      st      0.15
st      kg      6.35
in      cm      2.54
cm      in      0.394

To execute the script, give the command:

weightconv 100 kg lb
The result returned is:
Listing 2.

Power Filters

The classic example of “filtered pipelines” is from the book The UNIX Programming Environment:

cat $* |tr -sc A-Za-z '\012' |
sort |
uniq -c |
sort -n |

First, we concatenate all the input into one file using cat. Next, we put each word on a separate line using tr: the -s squeezes, the -c means to use the complement of the pattern given, i.e., anything that's not A-Za-z. Together, they strip out all characters that don't make up words and replace them with a new line; this has the effect of putting each word on a separate line. Then we feed the output of tr into uniq, which strips out duplicates and, with the -c argument, prints a count of the number of times a duplicate word has been found. We then sort numerically (-n), which gives us a list of words ordered by frequency. Finally, we print only the last ten lines of the output. We now have a simple word frequency counter. For any text input, it will output a list of the ten most frequently used words.


The combination of filters and pipes is very powerful, because it allows you to break down tasks and then pick the best tool for each task. Many jobs that would otherwise have to be handled in a programming language can be done under Linux by stringing together a few simple filters on the command line. Even when a programming language must be used for a particularly complicated filter, you still save a lot of development effort by doing as much as possible using existing tools.

I hope this article has given you some idea of this power. Working with your Linux box should be both easier and more productive using filters and pipes.

All listings referred to in this article are available by anonymous download in the file

Paul Dunne ( is an Irish writer and consultant who specializes in Linux. The only deadline he has ever met was the one for his very first article. His home page is at