Bash Regular Expressions

When working with regular expressions in a shell script the norm is to use grep or sed or some other external command/program. Since version 3 of bash (released in 2004) there is another option: bash's built-in regular expression comparison operator "=~".

Bash's regular expression comparison operator takes a string on the left and an extended regular expression on the right. It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).

In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..

The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:

#!/bin.bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

Assuming the script is saved in "bashre.sh", the following sample shows its output:

  # sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
  regex: aa(b{2,3}[xyz])cc

  aabbxcc matches
    capture[1]: bbx
  aabbcc does not match

______________________

Mitch Frazier is an Associate Editor for Linux Journal.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

bash-regular-expressions

Anonymous's picture

if [[ $1 =~ $regex ]]; then
this is not working in Fedora Core 10, does anybody help ?

More Details

Mitch Frazier's picture

What command line did you use to invoke the script? What version of bash did you use (bash --version)?

Mitch Frazier is an Associate Editor for Linux Journal.

\w is a word doesnt work in

Anonymous's picture

\w is a word doesnt work in bash regexp ?

\w is not supposed to "work"

Anonymous's picture

\w is a perl regex atom. bash uses POSIX extended regex instead.

man 7 regex for details

bash-regular-expressions

kenb's picture

I wrote the suggested bash script and got the demonstrated result. However when I invoked:
sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc aabbyccaabbbzcc
I expected to get two matches with the last parameter but I only got one. I'm surprised
that I'm the only one so what did I do wrong?
regex: aa(b{2,3}[xyz])cc

aabbxcc matches
capture[1]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
capture[1]: bby

Substring matches

Anonymous's picture

Your regular expression has only a single pair of parentheses, so you will not have more than one substring saved in BASH_REMATCH. If you had two pairs of parentheses in your regular expression, you could have two substrings saved in BASH_REMATCH, etc. Here's an example with three pairs of parentheses:

$ sh bashre.sh '((aa(b{2,3}[xyz])cc)+)' aabbxcc aabbcc aabbyccaabbbzcc
regex: ((aa(b{2,3}[xyz])cc)+)

aabbxcc matches
  capture[1]: aabbxcc
  capture[2]: aabbxcc
  capture[3]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
  capture[1]: aabbyccaabbbzcc
  capture[2]: aabbbzcc
  capture[3]: bbbz

Bash doesn't appear to have a "global" pattern matching switch a la perl's //g option.

add color and better indentation to the output

Albert Bicchi's picture

#!/bin/sh

if [[ $# -lt 2 ]]; then
    echo "Usage: regex PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo -e "\t\E[42;37m${1} - matches\E[33;0m"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo -e "\t\t\E[43;37mcapture[$i]: ${BASH_REMATCH[$i]}\E[33;0m"
            let i++
        done
    else
        echo -e "\t\E[41;37m${1} - does not match\E[33;0m"
    fi
    shift
done

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

PS: The captcha is really hard to read. It would be nice it there was an option to generate a new one that could possible be read by a mere human.

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

That simple?!

Robert de Bock's picture

My god, Bash sure is a great tool! Thanks for the information.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState