Shell Functions and Path Variables, Part 3
Suppose you log on to your UNIX system and discover, for reasons beyond your control, that PATH is full of duplicate entries. (Humour me. It does happen. Maybe your system administrator modified /etc/PATH inadvisedly). Let's assume these duplicates are making your PATH undesirably long. Is there anything you can do to clean things up? Yes, you can type at the prompt:
$ uniqpath
This will remove any duplicate entries from your path, leaving the order of the remaining pathels intact. For example:
$ NEWP=fred:bill:steve:fred:dave:bill $ uniqpath -p NEWP $ echo $NEWP fred:bill:steve:daveLet's skip the options-handling code again, and look at the meat:
npath=$(listpath -p $pathvar | awk '{seen[$0]++;
if (seen[$0]==1){print}}')
eval $pathvar=$(makepath "$npath")
As usual, $pathvar contains the name of the
pathvar we want to modify. The code is rather similar to that of
delpath. The first line generates a variable
(npath) containing the unique path elements, and
the second line rebuilds the pathvar from those elements using
makepath. We don't use an external file to store the pathels, but
keep everything in shell variables. This is done in order to
demonstrate an alternative technique—there is no deeper reason.
The first line runs listpath to break the pathvar into separate lines and pipes them through an awk filter which removes duplicate pathels. You may be wondering why we don't just use the uniq program instead of awk's magic. It's because uniq will remove duplicate lines from its input only if they happen to be adjacent. In our case, the duplicate pathels will generally not be adjacent, so uniq won't work. “Aha,” you say, “why not use sort -u? That will sort the lines and remove duplicates.” True enough, however, it may also modify the directory search order, if we ran uniqpath to alter PATH. Usually, people care about the order in which their PATH directories are searched, and it's a bad idea to modify it.
Thus, we have the awk solution. This uses a powerful feature of awk known as an associative array or hash (if you have a Perl background). If you're a C programmer, you'll know what an array is: a group of objects of the same type, indexed by an integer. The contents of an array can be accessed by expressions like values[0] or values[20], which refer to the first and twenty-first elements, respectively. A hash is rather like an array which can be indexed by an arbitrary string of characters. So, in awk notation, we could write
age["bill"]=27
to assign 27 to the hash element indexed by the string bill in the hash called age. Let's look at the awk code shown above.
Between the single quotes, we have a block of code run each time awk reads a new line from its standard input. When awk reads a line, it is stored in a special variable called $0, and we use $0 as an index into a hash called seen. (We haven't declared this anywhere—that's okay in awk. Variables spring into existence, with numerical value 0, when they appear in the code). We use the seen hash to tell us whether awk has already seen an identical line of input since it started executing. Let's see what happens in the NEWP example shown above.
First, listpath splits NEWP into lines containing the following strings: “fred”, “bill”, “steve”, “fred”, “dave” and “bill”, which are read in that order by awk. awk stores each line it reads in $0, so $0 takes on the values “fred”, “bill” and so on, in turn. Each time a line is read, the corresponding element of the seen hash is incremented (by the line seen[$0]++) and is printed only if it has been seen exactly once (by the print statement in the if block, which prints $0 to standard output by default). If we look at the hash element seen["fred"], this is initially 0 and is then set to 1 when awk reads the first “fred” line, remains at 1 for the next two lines, and is set to 2 when awk reads the second “fred” line. It is printed only when it is seen for the first time. C programmers should note how syntactically elegant this solution is and how little code is required when compared to the equivalent in C.
The final pathvar function we're going to see is edpath. This breaks the pathels in a pathvar into separate lines, writes them to a temporary file and runs an editor on that file. You can edit the pathels to your heart's content and quit from the editor when you're finished. The pathvar is then reconstructed from the modified lines in the file. edpath allows you to perform arbitrary modifications on a pathvar. I use it most often when I wish to swap the order of directories in PATH.
The code for edpath is fairly straightforward (ignoring once again the boring details of option handling):
TEMP=/tmp/edpath.out.$$
VAR=\$$pathvar # VAR="$LIBPATH" for example
eval export OLD$pathvar=$VAR # store old path in
# e.g. OLDPATH
listpath -p $pathvar > $TEMP # write path
# elements to file
${EDITOR:-vi} $TEMP # edit the file eval
$pathvar=$(makepath < $TEMP) # reconstruct path
/bin/rm -f $TEMP # remove temporary file
Let's skip the first three lines for now. The real work is done by the block of code starting with listpath. This follows a similar pattern as delpath and uniqpath. First, we separate the pathels in the pathvar using listpath, but this time, we redirect the output into a temporary file. The next line edits that file. The expression ${EDITOR:-vi} may be unfamiliar; it means “Use the value of the EDITOR variable if it is non-null, else use vi.” This allows the user to specify his favourite editor by setting the EDITOR environment variable (to Emacs, perhaps) but uses vi if he has not done so. Note that the edit command is run in the foreground, so the shell will wait until the editor process terminates before running any more commands from the shell function. When this occurs, the modified pathvar will be reconstructed by the line starting with eval. If you read the description of delpath given above, you'll know how this line works.
Lines 2 and 3 of the code are a safety net. They store the initial value of the pathvar to be edited in a new environment variable. If the user is editing PATH, for example, then the code creates a variable called OLDPATH. If the user makes unwanted modifications to her PATH, she can simply type:
$ PATH=$OLDPATH
and all will be well.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
3 hours 44 min ago - Dynamic DNS
4 hours 18 min ago - Reply to comment | Linux Journal
5 hours 17 min ago - Reply to comment | Linux Journal
6 hours 7 min ago - Not free anymore
10 hours 9 min ago - Great
13 hours 56 min ago - Reply to comment | Linux Journal
14 hours 4 min ago - Understanding the Linux Kernel
16 hours 19 min ago - General
18 hours 49 min ago - Kernel Problem
1 day 4 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Source Code Availability - pathfunc.tgz
Great article! Can the code be made available?? Email will do.
Thanks,
Rick
Where's the source code?
The ftp link at the end is password protected. Shouldn't it be open to the public?