Bash Regular Expressions

When working with regular expressions in a shell script the norm is to use grep or sed or some other external command/program. Since version 3 of bash (released in 2004) there is another option: bash's built-in regular expression comparison operator "=~".

Bash's regular expression comparison operator takes a string on the left and an extended regular expression on the right. It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).

In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..

The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:

#!/bin.bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

Assuming the script is saved in "bashre.sh", the following sample shows its output:

  # sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
  regex: aa(b{2,3}[xyz])cc

  aabbxcc matches
    capture[1]: bbx
  aabbcc does not match

______________________

Mitch Frazier is an Associate Editor for Linux Journal.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

bash-regular-expressions

Anonymous's picture

if [[ $1 =~ $regex ]]; then
this is not working in Fedora Core 10, does anybody help ?

More Details

Mitch Frazier's picture

What command line did you use to invoke the script? What version of bash did you use (bash --version)?

Mitch Frazier is an Associate Editor for Linux Journal.

\w is a word doesnt work in

Anonymous's picture

\w is a word doesnt work in bash regexp ?

\w is not supposed to "work"

Anonymous's picture

\w is a perl regex atom. bash uses POSIX extended regex instead.

man 7 regex for details

bash-regular-expressions

kenb's picture

I wrote the suggested bash script and got the demonstrated result. However when I invoked:
sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc aabbyccaabbbzcc
I expected to get two matches with the last parameter but I only got one. I'm surprised
that I'm the only one so what did I do wrong?
regex: aa(b{2,3}[xyz])cc

aabbxcc matches
capture[1]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
capture[1]: bby

Substring matches

Anonymous's picture

Your regular expression has only a single pair of parentheses, so you will not have more than one substring saved in BASH_REMATCH. If you had two pairs of parentheses in your regular expression, you could have two substrings saved in BASH_REMATCH, etc. Here's an example with three pairs of parentheses:

$ sh bashre.sh '((aa(b{2,3}[xyz])cc)+)' aabbxcc aabbcc aabbyccaabbbzcc
regex: ((aa(b{2,3}[xyz])cc)+)

aabbxcc matches
  capture[1]: aabbxcc
  capture[2]: aabbxcc
  capture[3]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
  capture[1]: aabbyccaabbbzcc
  capture[2]: aabbbzcc
  capture[3]: bbbz

Bash doesn't appear to have a "global" pattern matching switch a la perl's //g option.

add color and better indentation to the output

Albert Bicchi's picture

#!/bin/sh

if [[ $# -lt 2 ]]; then
    echo "Usage: regex PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo -e "\t\E[42;37m${1} - matches\E[33;0m"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo -e "\t\t\E[43;37mcapture[$i]: ${BASH_REMATCH[$i]}\E[33;0m"
            let i++
        done
    else
        echo -e "\t\E[41;37m${1} - does not match\E[33;0m"
    fi
    shift
done

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

PS: The captcha is really hard to read. It would be nice it there was an option to generate a new one that could possible be read by a mere human.

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

That simple?!

Robert de Bock's picture

My god, Bash sure is a great tool! Thanks for the information.

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions