Removing Duplicate PATH Entries: Reboot


In my first post on removing duplicate PATH entries I used an AWK one-liner. In the second post I used a Perl one-liner, or more accurately, I tried to dissect a Perl one-liner provided by reader Shaun. Shaun had asked that if I was willing to use AWK (not Bash), why not use Perl? It occurred to me that one might also ask: why not just use Bash? So, one more time into the void.


For those who made it through the second post, don't worry; this one should be mercifully short by comparison. And although I could make it a one-liner by using lots of semicolons, I won't do that either.

The approach remains pretty much the same as in the AWK and Perl versions of the code: split the path on colons, use an associative array to determine whether a path element has been seen before, and then join the non-duplicate path elements back together with colons.

Note that I incorrectly stated in my previous posts that empty path elements also could be eliminated in this process. In reality, empty path elements are the same as putting "." in your path; it means the current directory. This error on my part was pointed out in a comment to the second post. I guess I should have read the man page.

To split the PATH into its individual elements, I'll change bash's record separator to a colon and then assign the PATH variable to an array:

IFS=: ipaths=($PATH)

Note the assignment to IFS on the same line as the assignment to the array; if you haven't seen this before, it's standard bash syntax:

A simple command is a sequence of optional variable assignments followed by blank-separated words and redirections, and terminated by a control operator.

Now that I have the path elements in the paths variable, I crank up an associative array and test each element to see if it's in the array (again, remember that in bash, array elements that don't exist will evaluate to blank):

declare -A a    # Need to declare the array as associative
for p in "${ipaths[@]}"
    [[ -z "${a[$p]}" ]]  &&  a[$p]=1  &&  opaths+=":$p"

The loop steps through each path element in the ipaths array and creates the new path in a variable named opaths. The body of the loop tests to see if the current path element's array entry is blank ([[ -z "${a[$p]}" ]]), which means that the path has not been seen before. If it hasn't been seen before, it sets the path element's array entry to something non-blank (a[$p]=1) and then adds the path element, the variable containing the output path opaths+=":$p" (colons added here).

Unfortunately, that fails for blank path elements: blank associative array keys are not allowed in bash. To fix that, I'll use a separate variable to determine if a blank path has been seen before. Another option would be to change "::" to something like "*CURRENT_DIR*" before processing the path and then change it back afterward. The new loop looks like this:

declare -A a
for p in "${ipaths[@]}"
    if [[ -z "$p" ]]; then
        [[ -z "$currdir" ]]  &&  currdir=1  &&  opaths+=":"
        [[ -z "${a[$p]}" ]]  &&  a[$p]=1  &&  opaths+=":$p"

Since all the output paths are preceded by a colon, even the first one, the final output path needs to have the first colon removed. I do this with a simple substring evaluation:

export PATH="${opaths:1}"

Short and sweet. It's not as short as the Perl version or the original AWK version, but I could just put the code above into a bash function and pretend this version is really short:

export PATH="$(remove_path_dupes)"

I can only hope that this is the last time I write about removing duplicates from the PATH variable, but I can't make any guarantees.

Edit: the following is an addendum to my original post.

A comment by a reader "pepa65" on the previous posts suggested another all-bash solution, and quite frankly, it's a much slicker solution than mine:

IPATH='/usr/bin:/usr/local/bin::/usr/bin:/some folder/j:'
OPATH=$(n= IFS=':'; for e in $IPATH; do [[ :$n == *:$e:* ]] || n+=$e:; done; echo "${n:0: -1}")
echo $IPATH
echo $OPATH

To make it a bit easier to see, I'll unroll the part inside the command substitution (the $(...) expression):

n= IFS=':'
for e in $IPATH
    [[ :$n == *:$e:* ]]  ||  n+=$e:
echo "${n:0: -1}"

Rather than using an associative array to see if a path element is in the output path, it simply uses a glob comparison to see if the path element is already found in the output path variable.

At first you may be wondering why this even works, because in the section on pathname expansion (aka glob), the bash man page states:

After word splitting, unless the -f option has been set, bash scans each word for the characters *, ?, and [. If one of these characters appears, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of filenames matching the pattern...

And we certainly don't want the asterisks in the pattern to expand to a list of filenames. But that's not a concern, because in the section about [[ expression ]] evaluation, the man page states:

... Word splitting and pathname expansion are not performed on the words between the [[ and ]] ...

So pathname expansion does not happen. To get to why it actually works, continue reading the [[ expression ]] section, and a bit further down you'll see:

When the == and != operators are used, the string to the right of the operator is considered a pattern and matched according to the rules described below under Pattern Matching, as if the extglob shell option were enabled. ...

Note that like the bash regular expression operator =~, when pattern matching is involved, these operators are not symmetric, so the following won't work:

[[ *:$e:* == :$n ]]  ||  n+=$e:

Great solution! I'll declare it the winner.

Mitch Frazier is an embedded systems programmer at Emerson Electric Co. Mitch has been a contributor to and a friend of Linux Journal since the early 2000s.

Load Disqus comments