Filters: Doing It Your Way
One of the basic philosophies of Linux (as with all flavours of Unix) is that each program does one particular task, and does it well. Often you combine several programs to achieve something, either at the shell prompt or in a script, by piping the output of one program into the next. I'm talking about things like
ls -l | more
and
ps -auxw | \ grep netscape >> people.who.should.be.working
But what if the output of one program isn't in the format needed for the next? We need some way of processing the output of one program so that it is ready for the next.
Fortunately, there are many Linux programs that do this job: read some input, perform some operations on it, and write the altered data as the output. These programs are called filters. Some filters do quite limited tasks, such as head, grep and sort, whereas others are more flexible, such as sed and awk. In this article, we're going to look at several of these more flexible filters, and give several examples of what can be done with them.
The name “sed” is a contraction of stream editor; sed applies editing commands to a stream of data. A common use for sed is to replace one text pattern with another, as in
sed 's/Fred/Barney/g' foo
This command takes the file foo, changes every occurrence of Fred to Barney, and writes the modified version to standard output.
Note that in this example we have placed the actual sed commands inside single quotes. Sed doesn't require that commands be quoted this way, but you will need to use quotes if the sed command includes characters that are special to the shell, such as $ or *. This example doesn't have any special characters, so we could just as easily have left out the quotes. Try it and see.
Without the input file foo, sed reads from standard input, so we could achieve the same result with the command
sed 's/Fred/Barney/g' < foo
or
cat foo | sed 's/Fred/Barney/g'
Note that the first two versions are generally preferred to the third. Using cat just to send input into a pipe creates an extra process which can often be avoided.
We also have to consider the output. By default, the results appear on standard output, but this isn't always what we want. One option is to pipe the output through a pager, for example
sed 's/Fred/Barney/g' foo | more
or to redirect it to a file
sed 's/Fred/Barney/g' foo > bar
While it is often tempting to write
sed 's/Fred/Barney/g' foo > foo
the only thing this achieves is to delete contents of the file foo! Why? Because the first thing the shell does with this command is to open the file foo for output, destroying what was there already. When it tries to read from foo, there is nothing there to read. The result is an empty file. This is an easy mistake to make when redirecting output in this way, so do be careful.
Awk is a bit more flexible than sed; it is a full-fledged programming language in its own right. However, don't let that put you off. Writing simple programs in awk is surprisingly easy, and it often doesn't feel like a programming language [See page 46 of Linux Journal issue 25, May 1996—ED]. For example, the command
awk '{print NR, $0}' foo
prints the file foo, numbering each line as it goes. Awk can also read its input from a pipe or from standard input, exactly like sed, and also writes on standard output, unless you redirect it. The bit between the quotes (which are necessary, since the {} characters are also special characters to the shell) is the awk program. I said they can be simple, didn't I? An awk program is simply a sequence of one or more pattern-action statements, in the form
pattern { action }
Each input line is tested against each pattern in turn. When an input line matches a pattern, the corresponding action is performed. Either the pattern may be empty, in which case every line matches, or the action may be empty, in which case the default action is to print the line.
In the example above, the pattern was empty, so every line matched. The action was to print NR, which is a built-in awk variable containing the number of lines read so far, and then print $0, which is the current line.
Now that we've seen the basic idea behind sed and awk, we're going to look at some examples. The best way to learn something is to actually do it, and I recommend that you try out some of these examples yourself as you go along, possibly even with one eye on the man pages. We certainly aren't going to cover everything that sed and awk can do, but you will, it is hoped, have more confidence to try things out yourself once you've finished reading this article.
Our first example is to remove all the spaces from a document. This is easily achieved using sed:
sed 's/ *//g' foo
This is like the earlier example with Fred and Barney, only here we have used a regular expression: ' *' (the quotes are included so that you can see the space that is part of the regular expression). sed's s (for substitute) command using regular expressions just like grep. The regexp ' *' matches one or more spaces, which are replaced with nothing—they are deleted. This command doesn't deal with tabs, as it stands, but you could modify it to match one or more occurences of either a tab or a space:
sed 's/[ {tab}][ {tab}]*//g' foo
Trending Topics
| You Need A Budget | Feb 10, 2012 |
| The Linux powered LAN Gaming House | Feb 08, 2012 |
| Creating a vDSO: the Colonel's Other Chicken | Feb 06, 2012 |
| Your CMS Is Not Your Web Site | Feb 01, 2012 |
| Casper, the Friendly (and Persistent) Ghost | Jan 31, 2012 |
| Razor-qt 0.4 - Qt based Desktop Environment | Jan 30, 2012 |
- Fun with ethtool
- Parallel Programming with NVIDIA CUDA
- Readers' Choice Awards 2011
- 100% disappointed with the decision to go all digital.
- Linux-Based X Terminals with XDMCP
- Validate an E-Mail Address with PHP, the Right Way
- You Need A Budget
- The Linux powered LAN Gaming House
- Why Python?
- Python for Android






4 hours 42 min ago
6 hours 2 min ago
8 hours 45 min ago
13 hours 16 min ago
18 hours 23 min ago
19 hours 23 min ago
1 day 4 hours ago
1 day 5 hours ago
1 day 11 hours ago
1 day 14 hours ago