Text Manipulation with sed
For those who have never used regular expressions, here are three regular expressions that are very useful when combined with sed:
To match the start of a line, use the ^ character.
To match the end of a line, use the $ character.
To match any number of characters in a regular expression, use the characters .*. The . matches any single character, and the * matches any number of characters (including none at all).
Filter out empty lines from a file:
sed -e '/^$/d' your_file.txt
Add the computer named mycomputer to the end of every line in /etc/exports:
cat /etc/exports | \ sed -e 's/$/ mycomputer/' > /etc/exports
Add the computer named comp2 only to the directories beginning with /data/ in /etc/exports:
cat /etc/exports | \ sed -e '/^\/data\//s/$/ comp2/' > /etc/exports
See how the forward slashes used in the directory name have to be escaped using back slashes? Without the back slashes, sed interprets the forward slashes in the directory specifier as the delimiters in the sed command itself. However, the back slashes can make the sed command difficult to read and follow.
Remove the first word on each line (including any leading spaces and the trailing space):
cat test3.txt | sed -e 's/^ *[^ ]* //'
More regular expression matching is used in this example. Here's what it is doing.
The initial ^ * is used to match any number of spaces at the beginning of the line. The [^ ]* then matches any number of characters that are not spaces (the ^ inside the brace reverses the match on the space), so it matches a single word. The trailing space at the end matches the space found at the end of the first word. The empty replace pattern removes the text.
Remove the last word on each line:
cat test3.txt | sed -e 's/^\(.*\) .*/\1/'
This command introduces the concept of hold buffers. Hold buffers are used to keep parts of the matched text and to insert that text into the result. The pattern that matches the text between the parentheses is recalled in the substitution pattern by the \1. If an additional set of parentheses were in the match pattern, they would be addressed in the substitution pattern as \2, and so on, for more sets of parentheses. Up to nine hold buffers can be specified. In this example, the pattern contained within the parentheses matches from the start of the line up to the last space (the space after the parentheses).
To remove leading { and trailing }, or a } from each line:
sed -e 's/^.*{\(.*\)},*/\1/' table.txt
I'll leave it to the reader to dig in to this regular expression to see how it operates. Keep this in mind—the more comfortable you are with regular expressions and hold buffers, the more powerful the sed command becomes.
sed recognizes many other commands. However, even with these basic commands, you can successfully manipulate text files from within your own shell scripts or right from the command line.
Larry Richardson develops meteorological workstation software for 3SI. He has developed software for UNIX and Windows using C and C++ for more than 13 years. Now living in Georgia with his wife and son, he enjoys playing bass in his spare time.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
12 min 30 sec ago - Nice article, thanks for the
10 hours 52 min ago - I once had a better way I
16 hours 38 min ago - Not only you I too assumed
16 hours 56 min ago - another very interesting
18 hours 49 min ago - Reply to comment | Linux Journal
20 hours 42 min ago - Reply to comment | Linux Journal
1 day 3 hours ago - Reply to comment | Linux Journal
1 day 3 hours ago - Favorite (and easily brute-forced) pw's
1 day 5 hours ago - Have you tried Boxen? It's a
1 day 11 hours ago




Comments
sed deletion help
Hello all..
plz help me.
I have a doubt regarding deletion usind sed. We can use following cammand to delete lines between 5 and 10 from filename.txt.
sed '5,10d' filename.txt
I have two variable $startline and $endline. How do use sed command with these variables? when i use
sed '$startline,endlined filename.txt
i am getting errors.
I know this is a basic syntax error, but plz help me to solve this.
how to do this: From a file
how to do this:
From a file containing telephone director, create a new list from this
file that shows surname first, followed by a comma(,) and then the first
name and rest of the line.
ex- gupta, shiv 98797630
unnecessary pipe
Often you don't need to pipe a "cat file.txt" to sed, you can sed the file directly.
cs
/home/sphinx/TUTORIAL/53/train/raw/u1078.raw
/home/sphinx/TUTORIAL/53/train/raw/u1079.raw
/home/sphinx/TUTORIAL/53/train/raw/u1080.raw
i have above text in my 777.txt file . what i want is replace /home/sphinx/TUTORIAL/53/raw/ with blank space and .raw also should be replaced with blank space..... pls help me i forget..... i studied long back about sed awk cut ,reg exprs
Assuming by "blank space"
Assuming by "blank space" you mean change them to zero length strings, this should do it:
This will output:
Mitch Frazier is an Associate Editor for Linux Journal.
One Flaw
Yes, the problem is as indicated. Evidently he didn't check that his test bed would work. I was wondering if there was a system to generate unique temporary file names so that you would have something like:
#assign temp but unique file name to TEMP$
sed -f work.cmd database.txt > TEMP$
#analyze TEMP$ to ensure it is OK
mv --force TEMP$ databasee.txt
It occurred to me that the date command could be used initially to generate a filename.
For example: the date output of Thu Jun 16 15:45:41 PDT 2005 could be massaged to become 2005Jun16154541PTD.txt, which should be unique.
I imagine someone has done this already, but I haven't looked for it (yet).
parl
Check out the man page on
Check out the man page on mktemp
sed... a cautionary note on re-directions
Good introductory article to sed.
One observation though:
I would not recommend users issue command of the form:
$ cat fname.txt |
sed -e s/something/something else/ > fname.txt
In the above example, which is semantically similar to the examples in the article the user is asking the shell to use fname.txt as input and output! Unless the specific commands are designed to handle this (e.g., sort which handles this via the "-o fname" option), asking the semantics of the shell to handle this is very dangerous. Depending on the shell, the version of the shell, etc., the above example may actually give the user an empty result file, a truncated file, or a corrupt file. Instead, I would recommend redirection to some intermediate file, then after inspection and satisfaction with results, copy intermediate file back to original.
sed usage
yes this problem exist
when we use the same file as source and destination this problem is seen.
for exp:
sed -e 's/2/3' d3.txt > d3.txt
will return the empty file
this can ba dangerous in live sceanrios
so paly with cautions
:)
jignesh
Use the "-i" option. "Edit in
Use the "-i" option.
"Edit in place".
Isn't for all
-i option is not always present.