Filters: Doing It Your Way

A look at several of the more flexible filters, probrams that read some input, perform some operation on it, and write the altered data as output.
When Line Numbers Are Not Enough

Selecting part of a file using line numbers is easy enough to do, but often you don't know the line numbers you want. Instead, you want to select lines based on their contents. In awk, we can easily select a line matching a pattern, with

awk '/regexp/' foo

Which causes all lines containing regexp to be printed. There is a direct sed equivalent of this:

sed -n '/regexp/p' foo

Of course, we can also use grep to do this kind of thing:

grep 'regexp' foo

but sed can also handle ranges easily. For example, to get all lines of a file up to and including the first line matching a regexp, you would type:

sed -n '1,/regexp/p' foo

or to get all lines including and after the first line matching regexp:

sed -n '/regexp/,$p' foo

Remember that $ means the last line in a file. You can also specify a range based on two regexps. Try

sed -n '/regexp1/,/regexp2/p' foo

Note that this prints all blocks starting with lines containing regexp1 through lines containing regexp2, not just the first one. If there isn't a matching regexp2 for a line containing regexp1, then we get all lines through to the end of the file.

Now we can select some part of the input, based on a regular expression.

We might want to delete some lines that contain a certain pattern. The d command does just that:

sed '/regexp/d' foo

deletes all lines that match the regexp. Or, we might want to delete a block of text:

sed '/regexp1/,/regexp2/d' foo

deletes everything from a line that contains regexp1, up to and including a line that matches regexp2. Again, sed will select all blocks of text delimited by regexp1 and regexp2, so there is a danger we could delete more than we want to.

Inserting text at a given point is possible, too. The command

sed '/regexp/r bar' foo

inserts the contents of the file bar after any line that matches the regexp in the file foo.

Now, we can combine these last two commands to replace a block of text in a file with the contents of another file. We do it like this:

sed -e '/START/r bar' -e '/START/,/END/d' foo

This finds a line containing START, deletes through to a line containing END, then reads in the contents of the file bar. Because the r command doesn't read in the file until the next input line is read, the d command is executed before the new text is read in, so the d command doesn't delete the new text, as one might expect, looking at this command. The -e option tells sed that the next argument is a command, rather than an input file. Although it is optional when there is only one command, if we have multiple commands, they must each be preceded with -e.


These examples have mostly been line oriented, but we are just as likely to want to deal with columns of data. The filter cut can select columns of data. For example, to list the real names of all the users on your system, you could type

cut -f5 -d: /etc/passwd
The 5 argument after -f tells cut to list the
fifth column (where real names are stored), and the -d
flag is used to tell cut which character delimits the
field—in the case of the password file, it's a colon. To get
both the username (which is in the first column) and the real
name, we could use
cut -f1,5 -d: /etc/passwd

Awk is also good at getting at columns of data, we could do these tasks with the following awk commands:

awk -F: '{print $5}' /etc/passwd


awk -F: '{print $1,$5}' /etc/passwd

where the -F flag tells awk what character the fields are delimited by. (Do you see the difference between using cut and using awk for printing more than one field? If not, try running the commands again and looking more closely.)

One advantage of using awk is that we can perform operations on the columns.

For example, if we want to find out how much disk space the files in the current directory take up, we could total up the fifth column of the output of ls -l:

ls -l | grep -v '^d' | \
  awk '{s += $5} END {print s}'

In this command, we use grep to remove any lines that begin with d, so we don't count directories. We chose grep, but we could just as easily have used awk or sed to do this. One pure awk solution could be:

ls -l | awk '! /^d/ {s += $5} END {print s}'

where the awk program only totals the fifth column of lines that don't begin with a d—the exclamation mark before the pattern tells awk to select lines which don't match the regular expression /^d/.


White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState