Filters: Doing It Your Way
One of the basic philosophies of Linux (as with all flavours of Unix) is that each program does one particular task, and does it well. Often you combine several programs to achieve something, either at the shell prompt or in a script, by piping the output of one program into the next. I'm talking about things like
ls -l | more
and
ps -auxw | \ grep netscape >> people.who.should.be.working
But what if the output of one program isn't in the format needed for the next? We need some way of processing the output of one program so that it is ready for the next.
Fortunately, there are many Linux programs that do this job: read some input, perform some operations on it, and write the altered data as the output. These programs are called filters. Some filters do quite limited tasks, such as head, grep and sort, whereas others are more flexible, such as sed and awk. In this article, we're going to look at several of these more flexible filters, and give several examples of what can be done with them.
The name “sed” is a contraction of stream editor; sed applies editing commands to a stream of data. A common use for sed is to replace one text pattern with another, as in
sed 's/Fred/Barney/g' foo
This command takes the file foo, changes every occurrence of Fred to Barney, and writes the modified version to standard output.
Note that in this example we have placed the actual sed commands inside single quotes. Sed doesn't require that commands be quoted this way, but you will need to use quotes if the sed command includes characters that are special to the shell, such as $ or *. This example doesn't have any special characters, so we could just as easily have left out the quotes. Try it and see.
Without the input file foo, sed reads from standard input, so we could achieve the same result with the command
sed 's/Fred/Barney/g' < foo
or
cat foo | sed 's/Fred/Barney/g'
Note that the first two versions are generally preferred to the third. Using cat just to send input into a pipe creates an extra process which can often be avoided.
We also have to consider the output. By default, the results appear on standard output, but this isn't always what we want. One option is to pipe the output through a pager, for example
sed 's/Fred/Barney/g' foo | more
or to redirect it to a file
sed 's/Fred/Barney/g' foo > bar
While it is often tempting to write
sed 's/Fred/Barney/g' foo > foo
the only thing this achieves is to delete contents of the file foo! Why? Because the first thing the shell does with this command is to open the file foo for output, destroying what was there already. When it tries to read from foo, there is nothing there to read. The result is an empty file. This is an easy mistake to make when redirecting output in this way, so do be careful.
Awk is a bit more flexible than sed; it is a full-fledged programming language in its own right. However, don't let that put you off. Writing simple programs in awk is surprisingly easy, and it often doesn't feel like a programming language [See page 46 of Linux Journal issue 25, May 1996—ED]. For example, the command
awk '{print NR, $0}' foo
prints the file foo, numbering each line as it goes. Awk can also read its input from a pipe or from standard input, exactly like sed, and also writes on standard output, unless you redirect it. The bit between the quotes (which are necessary, since the {} characters are also special characters to the shell) is the awk program. I said they can be simple, didn't I? An awk program is simply a sequence of one or more pattern-action statements, in the form
pattern { action }
Each input line is tested against each pattern in turn. When an input line matches a pattern, the corresponding action is performed. Either the pattern may be empty, in which case every line matches, or the action may be empty, in which case the default action is to print the line.
In the example above, the pattern was empty, so every line matched. The action was to print NR, which is a built-in awk variable containing the number of lines read so far, and then print $0, which is the current line.
Now that we've seen the basic idea behind sed and awk, we're going to look at some examples. The best way to learn something is to actually do it, and I recommend that you try out some of these examples yourself as you go along, possibly even with one eye on the man pages. We certainly aren't going to cover everything that sed and awk can do, but you will, it is hoped, have more confidence to try things out yourself once you've finished reading this article.
Our first example is to remove all the spaces from a document. This is easily achieved using sed:
sed 's/ *//g' foo
This is like the earlier example with Fred and Barney, only here we have used a regular expression: ' *' (the quotes are included so that you can see the space that is part of the regular expression). sed's s (for substitute) command using regular expressions just like grep. The regexp ' *' matches one or more spaces, which are replaced with nothing—they are deleted. This command doesn't deal with tabs, as it stands, but you could modify it to match one or more occurences of either a tab or a space:
sed 's/[ {tab}][ {tab}]*//g' foo
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- What's the tweeting protocol?
- New Products
- Trying to Tame the Tablet
- Dart: a New Web Programming Experience
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




59 min 2 sec ago
17 hours 47 min ago
20 hours 19 min ago
21 hours 37 min ago
22 hours 12 min ago
22 hours 34 min ago
1 day 3 hours ago
1 day 4 hours ago
1 day 5 hours ago
1 day 7 hours ago