Quantcast
Username/Email:  Password: 

Bash Regular Expressions

When working with regular expressions in a shell script the norm is to use grep or sed or some other external command/program. Since version 3 of bash (released in 2004) there is another option: bash's built-in regular expression comparison operator "=~".

Bash's regular expression comparison operator takes a string on the left and an extended regular expression on the right. It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).

In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..

The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:

#!/bin.bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

Assuming the script is saved in "bashre.sh", the following sample shows its output:

  # sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
  regex: aa(b{2,3}[xyz])cc

  aabbxcc matches
    capture[1]: bbx
  aabbcc does not match

______________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

bash-regular-expressions

kenb's picture

I wrote the suggested bash script and got the demonstrated result. However when I invoked:
sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc aabbyccaabbbzcc
I expected to get two matches with the last parameter but I only got one. I'm surprised
that I'm the only one so what did I do wrong?
regex: aa(b{2,3}[xyz])cc

aabbxcc matches
capture[1]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
capture[1]: bby

add color and better indentation to the output

Albert Bicchi's picture

#!/bin/sh

if [[ $# -lt 2 ]]; then
    echo "Usage: regex PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo -e "\t\E[42;37m${1} - matches\E[33;0m"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo -e "\t\t\E[43;37mcapture[$i]: ${BASH_REMATCH[$i]}\E[33;0m"
            let i++
        done
    else
        echo -e "\t\E[41;37m${1} - does not match\E[33;0m"
    fi
    shift
done

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

PS: The captcha is really hard to read. It would be nice it there was an option to generate a new one that could possible be read by a mere human.

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

That simple?!

Robert de Bock's picture

My god, Bash sure is a great tool! Thanks for the information.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options