Filters: Doing It Your Way
One of the basic philosophies of Linux (as with all flavours of Unix) is that each program does one particular task, and does it well. Often you combine several programs to achieve something, either at the shell prompt or in a script, by piping the output of one program into the next. I'm talking about things like
ls -l | more
and
ps -auxw | \ grep netscape >> people.who.should.be.working
But what if the output of one program isn't in the format needed for the next? We need some way of processing the output of one program so that it is ready for the next.
Fortunately, there are many Linux programs that do this job: read some input, perform some operations on it, and write the altered data as the output. These programs are called filters. Some filters do quite limited tasks, such as head, grep and sort, whereas others are more flexible, such as sed and awk. In this article, we're going to look at several of these more flexible filters, and give several examples of what can be done with them.
The name “sed” is a contraction of stream editor; sed applies editing commands to a stream of data. A common use for sed is to replace one text pattern with another, as in
sed 's/Fred/Barney/g' foo
This command takes the file foo, changes every occurrence of Fred to Barney, and writes the modified version to standard output.
Note that in this example we have placed the actual sed commands inside single quotes. Sed doesn't require that commands be quoted this way, but you will need to use quotes if the sed command includes characters that are special to the shell, such as $ or *. This example doesn't have any special characters, so we could just as easily have left out the quotes. Try it and see.
Without the input file foo, sed reads from standard input, so we could achieve the same result with the command
sed 's/Fred/Barney/g' < foo
or
cat foo | sed 's/Fred/Barney/g'
Note that the first two versions are generally preferred to the third. Using cat just to send input into a pipe creates an extra process which can often be avoided.
We also have to consider the output. By default, the results appear on standard output, but this isn't always what we want. One option is to pipe the output through a pager, for example
sed 's/Fred/Barney/g' foo | more
or to redirect it to a file
sed 's/Fred/Barney/g' foo > bar
While it is often tempting to write
sed 's/Fred/Barney/g' foo > foo
the only thing this achieves is to delete contents of the file foo! Why? Because the first thing the shell does with this command is to open the file foo for output, destroying what was there already. When it tries to read from foo, there is nothing there to read. The result is an empty file. This is an easy mistake to make when redirecting output in this way, so do be careful.
Awk is a bit more flexible than sed; it is a full-fledged programming language in its own right. However, don't let that put you off. Writing simple programs in awk is surprisingly easy, and it often doesn't feel like a programming language [See page 46 of Linux Journal issue 25, May 1996—ED]. For example, the command
awk '{print NR, $0}' foo
prints the file foo, numbering each line as it goes. Awk can also read its input from a pipe or from standard input, exactly like sed, and also writes on standard output, unless you redirect it. The bit between the quotes (which are necessary, since the {} characters are also special characters to the shell) is the awk program. I said they can be simple, didn't I? An awk program is simply a sequence of one or more pattern-action statements, in the form
pattern { action }
Each input line is tested against each pattern in turn. When an input line matches a pattern, the corresponding action is performed. Either the pattern may be empty, in which case every line matches, or the action may be empty, in which case the default action is to print the line.
In the example above, the pattern was empty, so every line matched. The action was to print NR, which is a built-in awk variable containing the number of lines read so far, and then print $0, which is the current line.
Now that we've seen the basic idea behind sed and awk, we're going to look at some examples. The best way to learn something is to actually do it, and I recommend that you try out some of these examples yourself as you go along, possibly even with one eye on the man pages. We certainly aren't going to cover everything that sed and awk can do, but you will, it is hoped, have more confidence to try things out yourself once you've finished reading this article.
Our first example is to remove all the spaces from a document. This is easily achieved using sed:
sed 's/ *//g' foo
This is like the earlier example with Fred and Barney, only here we have used a regular expression: ' *' (the quotes are included so that you can see the space that is part of the regular expression). sed's s (for substitute) command using regular expressions just like grep. The regexp ' *' matches one or more spaces, which are replaced with nothing—they are deleted. This command doesn't deal with tabs, as it stands, but you could modify it to match one or more occurences of either a tab or a space:
sed 's/[ {tab}][ {tab}]*//g' foo
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Designing Electronics with Linux
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




2 hours 38 min ago
7 hours 5 min ago
10 hours 41 min ago
11 hours 13 min ago
13 hours 37 min ago
13 hours 40 min ago
13 hours 41 min ago
18 hours 6 min ago
19 hours 57 min ago
1 day 1 hour ago