Text Manipulation with sed
For those who have never used regular expressions, here are three regular expressions that are very useful when combined with sed:
To match the start of a line, use the ^ character.
To match the end of a line, use the $ character.
To match any number of characters in a regular expression, use the characters .*. The . matches any single character, and the * matches any number of characters (including none at all).
Filter out empty lines from a file:
sed -e '/^$/d' your_file.txt
Add the computer named mycomputer to the end of every line in /etc/exports:
cat /etc/exports | \ sed -e 's/$/ mycomputer/' > /etc/exports
Add the computer named comp2 only to the directories beginning with /data/ in /etc/exports:
cat /etc/exports | \ sed -e '/^\/data\//s/$/ comp2/' > /etc/exports
See how the forward slashes used in the directory name have to be escaped using back slashes? Without the back slashes, sed interprets the forward slashes in the directory specifier as the delimiters in the sed command itself. However, the back slashes can make the sed command difficult to read and follow.
Remove the first word on each line (including any leading spaces and the trailing space):
cat test3.txt | sed -e 's/^ *[^ ]* //'
More regular expression matching is used in this example. Here's what it is doing.
The initial ^ * is used to match any number of spaces at the beginning of the line. The [^ ]* then matches any number of characters that are not spaces (the ^ inside the brace reverses the match on the space), so it matches a single word. The trailing space at the end matches the space found at the end of the first word. The empty replace pattern removes the text.
Remove the last word on each line:
cat test3.txt | sed -e 's/^\(.*\) .*/\1/'
This command introduces the concept of hold buffers. Hold buffers are used to keep parts of the matched text and to insert that text into the result. The pattern that matches the text between the parentheses is recalled in the substitution pattern by the \1. If an additional set of parentheses were in the match pattern, they would be addressed in the substitution pattern as \2, and so on, for more sets of parentheses. Up to nine hold buffers can be specified. In this example, the pattern contained within the parentheses matches from the start of the line up to the last space (the space after the parentheses).
To remove leading { and trailing }, or a } from each line:
sed -e 's/^.*{\(.*\)},*/\1/' table.txt
I'll leave it to the reader to dig in to this regular expression to see how it operates. Keep this in mind—the more comfortable you are with regular expressions and hold buffers, the more powerful the sed command becomes.
sed recognizes many other commands. However, even with these basic commands, you can successfully manipulate text files from within your own shell scripts or right from the command line.
Larry Richardson develops meteorological workstation software for 3SI. He has developed software for UNIX and Windows using C and C++ for more than 13 years. Now living in Georgia with his wife and son, he enjoys playing bass in his spare time.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Validate an E-Mail Address with PHP, the Right Way
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
- Web & UI Developer (JavaScript & j Query)
- user namespaces
2 hours 34 min ago - yea
3 hours 32 sec ago - One advantage with VMs
5 hours 29 min ago - about info
6 hours 2 min ago - info
6 hours 3 min ago - info
6 hours 4 min ago - info
6 hours 6 min ago - info
6 hours 7 min ago - abut info
6 hours 9 min ago - info
6 hours 9 min ago




Comments
sed deletion help
Hello all..
plz help me.
I have a doubt regarding deletion usind sed. We can use following cammand to delete lines between 5 and 10 from filename.txt.
sed '5,10d' filename.txt
I have two variable $startline and $endline. How do use sed command with these variables? when i use
sed '$startline,endlined filename.txt
i am getting errors.
I know this is a basic syntax error, but plz help me to solve this.
how to do this: From a file
how to do this:
From a file containing telephone director, create a new list from this
file that shows surname first, followed by a comma(,) and then the first
name and rest of the line.
ex- gupta, shiv 98797630
unnecessary pipe
Often you don't need to pipe a "cat file.txt" to sed, you can sed the file directly.
cs
/home/sphinx/TUTORIAL/53/train/raw/u1078.raw
/home/sphinx/TUTORIAL/53/train/raw/u1079.raw
/home/sphinx/TUTORIAL/53/train/raw/u1080.raw
i have above text in my 777.txt file . what i want is replace /home/sphinx/TUTORIAL/53/raw/ with blank space and .raw also should be replaced with blank space..... pls help me i forget..... i studied long back about sed awk cut ,reg exprs
Assuming by "blank space"
Assuming by "blank space" you mean change them to zero length strings, this should do it:
This will output:
Mitch Frazier is an Associate Editor for Linux Journal.
One Flaw
Yes, the problem is as indicated. Evidently he didn't check that his test bed would work. I was wondering if there was a system to generate unique temporary file names so that you would have something like:
#assign temp but unique file name to TEMP$
sed -f work.cmd database.txt > TEMP$
#analyze TEMP$ to ensure it is OK
mv --force TEMP$ databasee.txt
It occurred to me that the date command could be used initially to generate a filename.
For example: the date output of Thu Jun 16 15:45:41 PDT 2005 could be massaged to become 2005Jun16154541PTD.txt, which should be unique.
I imagine someone has done this already, but I haven't looked for it (yet).
parl
Check out the man page on
Check out the man page on mktemp
sed... a cautionary note on re-directions
Good introductory article to sed.
One observation though:
I would not recommend users issue command of the form:
$ cat fname.txt |
sed -e s/something/something else/ > fname.txt
In the above example, which is semantically similar to the examples in the article the user is asking the shell to use fname.txt as input and output! Unless the specific commands are designed to handle this (e.g., sort which handles this via the "-o fname" option), asking the semantics of the shell to handle this is very dangerous. Depending on the shell, the version of the shell, etc., the above example may actually give the user an empty result file, a truncated file, or a corrupt file. Instead, I would recommend redirection to some intermediate file, then after inspection and satisfaction with results, copy intermediate file back to original.
sed usage
yes this problem exist
when we use the same file as source and destination this problem is seen.
for exp:
sed -e 's/2/3' d3.txt > d3.txt
will return the empty file
this can ba dangerous in live sceanrios
so paly with cautions
:)
jignesh
Use the "-i" option. "Edit in
Use the "-i" option.
"Edit in place".
Isn't for all
-i option is not always present.