Filters: Doing It Your Way
Selecting part of a file using line numbers is easy enough to do, but often you don't know the line numbers you want. Instead, you want to select lines based on their contents. In awk, we can easily select a line matching a pattern, with
awk '/regexp/' foo
Which causes all lines containing regexp to be printed. There is a direct sed equivalent of this:
sed -n '/regexp/p' foo
Of course, we can also use grep to do this kind of thing:
grep 'regexp' foo
but sed can also handle ranges easily. For example, to get all lines of a file up to and including the first line matching a regexp, you would type:
sed -n '1,/regexp/p' foo
or to get all lines including and after the first line matching regexp:
sed -n '/regexp/,$p' foo
Remember that $ means the last line in a file. You can also specify a range based on two regexps. Try
sed -n '/regexp1/,/regexp2/p' foo
Note that this prints all blocks starting with lines containing regexp1 through lines containing regexp2, not just the first one. If there isn't a matching regexp2 for a line containing regexp1, then we get all lines through to the end of the file.
Now we can select some part of the input, based on a regular expression.
We might want to delete some lines that contain a certain pattern. The d command does just that:
sed '/regexp/d' foo
deletes all lines that match the regexp. Or, we might want to delete a block of text:
sed '/regexp1/,/regexp2/d' foo
deletes everything from a line that contains regexp1, up to and including a line that matches regexp2. Again, sed will select all blocks of text delimited by regexp1 and regexp2, so there is a danger we could delete more than we want to.
Inserting text at a given point is possible, too. The command
sed '/regexp/r bar' foo
inserts the contents of the file bar after any line that matches the regexp in the file foo.
Now, we can combine these last two commands to replace a block of text in a file with the contents of another file. We do it like this:
sed -e '/START/r bar' -e '/START/,/END/d' foo
This finds a line containing START, deletes through to a line containing END, then reads in the contents of the file bar. Because the r command doesn't read in the file until the next input line is read, the d command is executed before the new text is read in, so the d command doesn't delete the new text, as one might expect, looking at this command. The -e option tells sed that the next argument is a command, rather than an input file. Although it is optional when there is only one command, if we have multiple commands, they must each be preceded with -e.
These examples have mostly been line oriented, but we are just as likely to want to deal with columns of data. The filter cut can select columns of data. For example, to list the real names of all the users on your system, you could type
cut -f5 -d: /etc/passwd The 5 argument after -f tells cut to list the fifth column (where real names are stored), and the -d flag is used to tell cut which character delimits the field—in the case of the password file, it's a colon. To get both the username (which is in the first column) and the real name, we could use
cut -f1,5 -d: /etc/passwd
Awk is also good at getting at columns of data, we could do these tasks with the following awk commands:
awk -F: '{print $5}' /etc/passwd
and
awk -F: '{print $1,$5}' /etc/passwd
where the -F flag tells awk what character the fields are delimited by. (Do you see the difference between using cut and using awk for printing more than one field? If not, try running the commands again and looking more closely.)
One advantage of using awk is that we can perform operations on the columns.
For example, if we want to find out how much disk space the files in the current directory take up, we could total up the fifth column of the output of ls -l:
ls -l | grep -v '^d' | \
awk '{s += $5} END {print s}'
In this command, we use grep to remove any lines that begin with d, so we don't count directories. We chose grep, but we could just as easily have used awk or sed to do this. One pure awk solution could be:
ls -l | awk '! /^d/ {s += $5} END {print s}'
where the awk program only totals the fifth column of lines that don't begin with a d—the exclamation mark before the pattern tells awk to select lines which don't match the regular expression /^d/.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- Weechat, Irssi's Little Brother
- New Products
- Developer Poll
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




10 min 23 sec ago
55 min 38 sec ago
1 hour 5 min ago
1 hour 10 min ago
3 hours 21 min ago
3 hours 22 min ago
4 hours 7 min ago
4 hours 55 min ago
5 hours 19 min ago
6 hours 56 min ago