Filters

This article is about filtering, a very powerful facility available to every Linux user, but one which migrants from other operating systems may find new and unusual.
Pipes: When One Filter Isn't Enough

The basic principle of the pipe (|) is that it allows us to connect the standard output of one program with the standard input of another. (See “Introduction to Named Pipes” by Andy Vaught, September 1997.) A moment's thought should make the usefulness of this when combined with filters quite obvious. We can build complex instructions 'programs', on the command line or in a shell script, simply by stringing filters together.

The filter wc (word count) puts its output in four columns by default. Instead of specifying the -c switch to count only characters, give this command:

wc lj.filters | awk ' { print $3 } '

This takes the output of wc:

258    1558    8921 lj.filters
and filters it to print only the third column, the character count, to the screen:
8921
If you want to print the whole input line, use $0 instead of $3.

Another handy filtering pipe is one that does a simple filtering of ls -a output in order to see only the hidden files:

ls -a| grep ^[.].*

Of course, pipes greatly increase the power of programmable filters such as sed and awk.

Data stored in simple ASCII tables can be manipulated by AWK. As a simple example, consider the weights and measures converter shown in Listing 2. We have a simple text file of conversions:

From    To      Rate---     ---     ----
kg      lb      2.20
lb      kg      0.4536
st      lb      14
lb      st      0.07
kg      st      0.15
st      kg      6.35
in      cm      2.54
cm      in      0.394

To execute the script, give the command:

weightconv 100 kg lb
The result returned is:
220
Listing 2.

Power Filters

The classic example of “filtered pipelines” is from the book The UNIX Programming Environment:

cat $* |tr -sc A-Za-z '\012' |
sort |
uniq -c |
sort -n |
tail

First, we concatenate all the input into one file using cat. Next, we put each word on a separate line using tr: the -s squeezes, the -c means to use the complement of the pattern given, i.e., anything that's not A-Za-z. Together, they strip out all characters that don't make up words and replace them with a new line; this has the effect of putting each word on a separate line. Then we feed the output of tr into uniq, which strips out duplicates and, with the -c argument, prints a count of the number of times a duplicate word has been found. We then sort numerically (-n), which gives us a list of words ordered by frequency. Finally, we print only the last ten lines of the output. We now have a simple word frequency counter. For any text input, it will output a list of the ten most frequently used words.

Conclusion

The combination of filters and pipes is very powerful, because it allows you to break down tasks and then pick the best tool for each task. Many jobs that would otherwise have to be handled in a programming language can be done under Linux by stringing together a few simple filters on the command line. Even when a programming language must be used for a particularly complicated filter, you still save a lot of development effort by doing as much as possible using existing tools.

I hope this article has given you some idea of this power. Working with your Linux box should be both easier and more productive using filters and pipes.

All listings referred to in this article are available by anonymous download in the file ftp.linuxjournal.com/pub/lj/listings/issue65/2479.tgz.

Paul Dunne (paul@dunne.ie.eu.org) is an Irish writer and consultant who specializes in Linux. The only deadline he has ever met was the one for his very first article. His home page is at http://www.cix.co.uk/~dunnp/

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState