Shell Functions and Path Variables, Part 3

A continuation of our introduction to path variables and elements.
uniqpath

Suppose you log on to your UNIX system and discover, for reasons beyond your control, that PATH is full of duplicate entries. (Humour me. It does happen. Maybe your system administrator modified /etc/PATH inadvisedly). Let's assume these duplicates are making your PATH undesirably long. Is there anything you can do to clean things up? Yes, you can type at the prompt:

$ uniqpath

This will remove any duplicate entries from your path, leaving the order of the remaining pathels intact. For example:

$ NEWP=fred:bill:steve:fred:dave:bill
$ uniqpath -p NEWP
$ echo $NEWP
fred:bill:steve:dave
Let's skip the options-handling code again, and look at the meat:
npath=$(listpath -p $pathvar | awk '{seen[$0]++;
if (seen[$0]==1){print}}')
    eval $pathvar=$(makepath "$npath")
As usual, $pathvar contains the name of the pathvar we want to modify. The code is rather similar to that of delpath. The first line generates a variable (npath) containing the unique path elements, and the second line rebuilds the pathvar from those elements using makepath. We don't use an external file to store the pathels, but keep everything in shell variables. This is done in order to demonstrate an alternative technique—there is no deeper reason.

The first line runs listpath to break the pathvar into separate lines and pipes them through an awk filter which removes duplicate pathels. You may be wondering why we don't just use the uniq program instead of awk's magic. It's because uniq will remove duplicate lines from its input only if they happen to be adjacent. In our case, the duplicate pathels will generally not be adjacent, so uniq won't work. “Aha,” you say, “why not use sort -u? That will sort the lines and remove duplicates.” True enough, however, it may also modify the directory search order, if we ran uniqpath to alter PATH. Usually, people care about the order in which their PATH directories are searched, and it's a bad idea to modify it.

Thus, we have the awk solution. This uses a powerful feature of awk known as an associative array or hash (if you have a Perl background). If you're a C programmer, you'll know what an array is: a group of objects of the same type, indexed by an integer. The contents of an array can be accessed by expressions like values[0] or values[20], which refer to the first and twenty-first elements, respectively. A hash is rather like an array which can be indexed by an arbitrary string of characters. So, in awk notation, we could write

age["bill"]=27

to assign 27 to the hash element indexed by the string bill in the hash called age. Let's look at the awk code shown above.

Between the single quotes, we have a block of code run each time awk reads a new line from its standard input. When awk reads a line, it is stored in a special variable called $0, and we use $0 as an index into a hash called seen. (We haven't declared this anywhere—that's okay in awk. Variables spring into existence, with numerical value 0, when they appear in the code). We use the seen hash to tell us whether awk has already seen an identical line of input since it started executing. Let's see what happens in the NEWP example shown above.

First, listpath splits NEWP into lines containing the following strings: “fred”, “bill”, “steve”, “fred”, “dave” and “bill”, which are read in that order by awk. awk stores each line it reads in $0, so $0 takes on the values “fred”, “bill” and so on, in turn. Each time a line is read, the corresponding element of the seen hash is incremented (by the line seen[$0]++) and is printed only if it has been seen exactly once (by the print statement in the if block, which prints $0 to standard output by default). If we look at the hash element seen["fred"], this is initially 0 and is then set to 1 when awk reads the first “fred” line, remains at 1 for the next two lines, and is set to 2 when awk reads the second “fred” line. It is printed only when it is seen for the first time. C programmers should note how syntactically elegant this solution is and how little code is required when compared to the equivalent in C.

edpath

The final pathvar function we're going to see is edpath. This breaks the pathels in a pathvar into separate lines, writes them to a temporary file and runs an editor on that file. You can edit the pathels to your heart's content and quit from the editor when you're finished. The pathvar is then reconstructed from the modified lines in the file. edpath allows you to perform arbitrary modifications on a pathvar. I use it most often when I wish to swap the order of directories in PATH.

The code for edpath is fairly straightforward (ignoring once again the boring details of option handling):

TEMP=/tmp/edpath.out.$$
VAR=\$$pathvar  # VAR="$LIBPATH" for example
eval export OLD$pathvar=$VAR  # store old path in
                              # e.g. OLDPATH
listpath -p $pathvar > $TEMP  # write path
                              #  elements to file
${EDITOR:-vi} $TEMP           # edit the file eval
$pathvar=$(makepath < $TEMP) # reconstruct path
/bin/rm -f $TEMP           # remove temporary file

Let's skip the first three lines for now. The real work is done by the block of code starting with listpath. This follows a similar pattern as delpath and uniqpath. First, we separate the pathels in the pathvar using listpath, but this time, we redirect the output into a temporary file. The next line edits that file. The expression ${EDITOR:-vi} may be unfamiliar; it means “Use the value of the EDITOR variable if it is non-null, else use vi.” This allows the user to specify his favourite editor by setting the EDITOR environment variable (to Emacs, perhaps) but uses vi if he has not done so. Note that the edit command is run in the foreground, so the shell will wait until the editor process terminates before running any more commands from the shell function. When this occurs, the modified pathvar will be reconstructed by the line starting with eval. If you read the description of delpath given above, you'll know how this line works.

Lines 2 and 3 of the code are a safety net. They store the initial value of the pathvar to be edited in a new environment variable. If the user is editing PATH, for example, then the code creates a variable called OLDPATH. If the user makes unwanted modifications to her PATH, she can simply type:

$ PATH=$OLDPATH

and all will be well.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Source Code Availability - pathfunc.tgz

Rick Lantaigne's picture

Great article! Can the code be made available?? Email will do.
Thanks,
Rick

Where's the source code?

Anonymous's picture

The ftp link at the end is password protected. Shouldn't it be open to the public?

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState