Filters

This article is about filtering, a very powerful facility available to every Linux user, but one which migrants from other operating systems may find new and unusual.
Pipes: When One Filter Isn't Enough

The basic principle of the pipe (|) is that it allows us to connect the standard output of one program with the standard input of another. (See “Introduction to Named Pipes” by Andy Vaught, September 1997.) A moment's thought should make the usefulness of this when combined with filters quite obvious. We can build complex instructions 'programs', on the command line or in a shell script, simply by stringing filters together.

The filter wc (word count) puts its output in four columns by default. Instead of specifying the -c switch to count only characters, give this command:

wc lj.filters | awk ' { print $3 } '

This takes the output of wc:

258    1558    8921 lj.filters
and filters it to print only the third column, the character count, to the screen:
8921
If you want to print the whole input line, use $0 instead of $3.

Another handy filtering pipe is one that does a simple filtering of ls -a output in order to see only the hidden files:

ls -a| grep ^[.].*

Of course, pipes greatly increase the power of programmable filters such as sed and awk.

Data stored in simple ASCII tables can be manipulated by AWK. As a simple example, consider the weights and measures converter shown in Listing 2. We have a simple text file of conversions:

From    To      Rate---     ---     ----
kg      lb      2.20
lb      kg      0.4536
st      lb      14
lb      st      0.07
kg      st      0.15
st      kg      6.35
in      cm      2.54
cm      in      0.394

To execute the script, give the command:

weightconv 100 kg lb
The result returned is:
220
Listing 2.

Power Filters

The classic example of “filtered pipelines” is from the book The UNIX Programming Environment:

cat $* |tr -sc A-Za-z '\012' |
sort |
uniq -c |
sort -n |
tail

First, we concatenate all the input into one file using cat. Next, we put each word on a separate line using tr: the -s squeezes, the -c means to use the complement of the pattern given, i.e., anything that's not A-Za-z. Together, they strip out all characters that don't make up words and replace them with a new line; this has the effect of putting each word on a separate line. Then we feed the output of tr into uniq, which strips out duplicates and, with the -c argument, prints a count of the number of times a duplicate word has been found. We then sort numerically (-n), which gives us a list of words ordered by frequency. Finally, we print only the last ten lines of the output. We now have a simple word frequency counter. For any text input, it will output a list of the ten most frequently used words.

Conclusion

The combination of filters and pipes is very powerful, because it allows you to break down tasks and then pick the best tool for each task. Many jobs that would otherwise have to be handled in a programming language can be done under Linux by stringing together a few simple filters on the command line. Even when a programming language must be used for a particularly complicated filter, you still save a lot of development effort by doing as much as possible using existing tools.

I hope this article has given you some idea of this power. Working with your Linux box should be both easier and more productive using filters and pipes.

All listings referred to in this article are available by anonymous download in the file ftp.linuxjournal.com/pub/lj/listings/issue65/2479.tgz.

Paul Dunne (paul@dunne.ie.eu.org) is an Irish writer and consultant who specializes in Linux. The only deadline he has ever met was the one for his very first article. His home page is at http://www.cix.co.uk/~dunnp/

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix