Polishing the wegrep Wrapper Script

When last I discussed shell scripts, I was presenting a shell script that offered an alternative to the -C context flag in GNU grep. Although most modern Linux systems have the more capable grep command, older systems likely don't have this particular feature, and it's also a good excuse to dig into working with wrapper scripts too.

"Wait. What's a wrapper script?" I can hear you ask, and some of you also are now trying to think of a famous rapper whose name you can reference for a punny response. I've already beat you there: "Can't touch that!"

A wrapper is a script that replaces a command on the Linux system but secretly calls the command, just offering more and better capabilities and features. When you have an alias set up so that every invocation of ls is really ls -F, that's the same basic idea.

Linux and its grizzled father UNIX are really powerful because they offer these sorts of capabilities; it's hard to write a wrapper for Microsoft Excel on a Windows 10 system, by contrast.

A command with multiple versions in the wild is a perfect example of where a wrapper can be so beneficial too. Imagine you're deploying a few hundred servers and want to run a bare-bones Linux on them to maximize available cycles. Problem is, your admin scripts rely on the very latest-and-greatest versions of sed, grep and find. Solution? Point the scripts at your wrapper versions of those commands, and make sure every flag you need is implemented, either in the base command (as would be the case on the newer systems) or through the wrapper code itself.

So, back to wegrep. When last I left this script, it offered up the base -C functionality of giving one or more lines of context before and after each match to a grep search. Left on the to-do list were to make it smarter about when to add the "- - - - - -" divider line, to add line numbers and to highlight the actual match.

Let's start with making the script smarter with the divider line, because that's by far the easiest. Like any script that tries to separate multiple blocks of output neatly, the key is really to count how many times the output has been sent. Here's the solution:


if [ $matches -eq 0 ] ; then
  echo "-----"
fi
matches=$(( $matches + 1 ))

This appears prior to each block of output. The very first time it produces the top divider line, and otherwise it's skipped. After the matching line or lines, however, there's another divider line that is included each and every time.

Adding line numbers can be accomplished a number of ways, but I'm going to exploit an interesting capability of the sed command itself, the "=" expression. Let me demonstrate with the wonderland.txt data file that contains the first couple paragraphs of Alice in Wonderland:


$ head -5 wonderland.txt | sed =
1
------------------------------------------------------
2

3
ALICE'S ADVENTURES IN WONDERLAND
4

5
Lewis Carroll

You can see what it does, I hope? It adds line numbers, but by having the number actually show up on a line prior to the actual matching line. It's a bit funky, but a second sed invocation fixes the problem and gives output that makes a lot more sense:


$ head -5 wonderland.txt | sed = | sed 'N;s/\n/:   /'
1:      ------------------------------------------------
2:
3:      ALICE'S ADVENTURES IN WONDERLAND
4:
5:      Lewis Carroll

In the above, the replacement sequence is a colon followed by the Tab character itself, which can be entered by typing Ctrl-V followed by the Tab itself—easily done in scripts.

So, that's two down: a smarter divider line and the ability to number the output lines. Let's see how that works:


$ sh wegrep.sh '^Alice' wonderland.txt
-----
12:
13:     ^Alice was beginning to get very tired of sitting by
14:     her sister on the bank, and of having nothing to do:
-----
27:     There was nothing so very remarkable in that; nor did
28:     ^Alice think it so very much out of the way to hear the
29:     Rabbit say to itself, 'Oh dear! Oh dear! I shall be
-----

The dividers work perfectly, showing up the minimum amount needed to denote each matching block of lines clearly, and the line numbers are neat and helpful.

The trickier part is still left to tackle. How do you actually highlight the match in each section?

ANSI Color Sequences

You may not realize it, but odds are incredibly high that your Terminal or xterm window, whether you're directly in a Linux system or connecting via a Windows or Mac computer, is emulating what's known as an ANSI terminal.

ANSI is the American National Standards Institute, but don't be misled; this is a global standard, particularly when it comes to colors, bold and other visual aspects to the terminal.

The problem is, the sequences to turn on and turn off bold or specific colors has to be fairly obscure to ensure that users don't accidentally end up invoking it. So "color:" would be a fail, as would "<color>". Instead, it's done through an escape sequence: Escape + [ + 3 + 2 + m causes all subsequent text to be rendered as green, for example.

The Escape + [ sequence prefix has a name of its own. It's a Control Sequence Introducer, although you probably don't need to know that! You can find a full table of ANSI color sequences on-line.

Once you're done with the highlighted text, you'll need to change the display back to regular text, and that's done with the sequence Escape + [ + 0 + m.

Add them all up, and here's what you use to highlight whatever value is stored as $1 in a string:


\033[32m$1\033[0m

The \033 is a shorthand for Escape. Rather than make this an echo statement, it's a good use of printf, so here's the sequence:


sed ''/$1/s//`printf "\033[32m$1\033[0m"`/'' "$2"

This basically replaces every occurrence of $1 with itself, prefixed with the ANSI green sequence and suffixed with the sequence to return subsequent text to its normal display characteristics.

I'm being a bit lazy here by exploiting how the script works too. If it can show matching lines from a file, it also can show matching lines that have had the ANSI sequences slipped in. So here's the new flow, and it's a bit more complicated than my original stab at this script:


sed ''/$1/s//`printf "\033[32m$1\033[0m"`/'' "$2" | \
sed = | sed 'N;s/\n/:  /' | \
sed -n "${before},${after}p"

Four invocations of sed in a row—ah, I love Linux!

In the above, the first sed invocation adds the ANSI sequences, the second and third work together to add the line number prefixes, and the fourth shows the lines in the stream from the range $before to $after.

To see how those are calculated, here's the full script:


#!/bin/sh
# wegrep - grep with context and regular expressions
grep=/usr/bin/grep
sed=/usr/bin/sed
context=1
matches=0
if [ $# -ne 2 ] ; then
  echo "Usage: wegrep [pattern] filename" ; exit 1
fi
for match in $($grep -n -E "$1" "$2" | cut -d: -f1)
do
  before=$(( $match - $context ))
   after=$(( $match + $context ))
  if [ $matches -eq 0 ] ; then
    echo "-----"
  fi
  sed ''/$1/s//`printf "\033[32m$1\033[0m"`/'' "$2" | \
    sed = | sed 'N;s/\n/:       /' | \
    sed -n "${before},${after}p"
  echo "-----"
  matches=$(( $matches + 1 ))
done
exit 0

It's surprisingly short given how useful this wrapper script is and how many new features have been added to an older, crude grep program.

And, here it is in use:


$ sh wegrep.sh 'Alice' wonderland.txt
-----
12:
13:     Alice was beginning to get very tired of sitting by her
14:     sister on the bank, and of having nothing to do: once
-----
16:     reading, but it had no pictures or conversations in it,
17:     'and what is the use of a book,' thought Alice 'without
18:     pictures or conversation?'
-----
27:     There was nothing so very remarkable in that; nor did
28:     Alice think it so very much out of the way to hear the
29:     Rabbit say to itself, 'Oh dear! Oh dear! I shall be
-----

There's still a hiccup in the script, however. Because of the ANSI sequence sed invocation, the proper functionality of regular expressions is lost (try it, you'll see what I mean). Is it a huge problem? Maybe not, but I'm going to leave solving it as an exercise for you, the reader.

As always, if you have suggestions, let me know via e-mail: dave@linuxjournal.com.

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a really long time. He's the author of Learning Unix for Mac OS X and Wicked Cool Shell Scripts. You can find him on Twitter as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.

Load Disqus comments