Bash Regular Expressions

When working with regular expressions in a shell script the norm is to use grep or sed or some other external command/program. Since version 3 of bash (released in 2004) there is another option: bash's built-in regular expression comparison operator "=~".

Bash's regular expression comparison operator takes a string on the left and an extended regular expression on the right. It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).

In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..

The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:

#!/bin.bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

Assuming the script is saved in "bashre.sh", the following sample shows its output:

  # sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
  regex: aa(b{2,3}[xyz])cc

  aabbxcc matches
    capture[1]: bbx
  aabbcc does not match

______________________

Mitch Frazier is an Associate Editor for Linux Journal.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

bash-regular-expressions

Anonymous's picture

if [[ $1 =~ $regex ]]; then
this is not working in Fedora Core 10, does anybody help ?

More Details

Mitch Frazier's picture

What command line did you use to invoke the script? What version of bash did you use (bash --version)?

Mitch Frazier is an Associate Editor for Linux Journal.

\w is a word doesnt work in

Anonymous's picture

\w is a word doesnt work in bash regexp ?

\w is not supposed to "work"

Anonymous's picture

\w is a perl regex atom. bash uses POSIX extended regex instead.

man 7 regex for details

bash-regular-expressions

kenb's picture

I wrote the suggested bash script and got the demonstrated result. However when I invoked:
sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc aabbyccaabbbzcc
I expected to get two matches with the last parameter but I only got one. I'm surprised
that I'm the only one so what did I do wrong?
regex: aa(b{2,3}[xyz])cc

aabbxcc matches
capture[1]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
capture[1]: bby

Substring matches

Anonymous's picture

Your regular expression has only a single pair of parentheses, so you will not have more than one substring saved in BASH_REMATCH. If you had two pairs of parentheses in your regular expression, you could have two substrings saved in BASH_REMATCH, etc. Here's an example with three pairs of parentheses:

$ sh bashre.sh '((aa(b{2,3}[xyz])cc)+)' aabbxcc aabbcc aabbyccaabbbzcc
regex: ((aa(b{2,3}[xyz])cc)+)

aabbxcc matches
  capture[1]: aabbxcc
  capture[2]: aabbxcc
  capture[3]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
  capture[1]: aabbyccaabbbzcc
  capture[2]: aabbbzcc
  capture[3]: bbbz

Bash doesn't appear to have a "global" pattern matching switch a la perl's //g option.

add color and better indentation to the output

Albert Bicchi's picture

#!/bin/sh

if [[ $# -lt 2 ]]; then
    echo "Usage: regex PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo -e "\t\E[42;37m${1} - matches\E[33;0m"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo -e "\t\t\E[43;37mcapture[$i]: ${BASH_REMATCH[$i]}\E[33;0m"
            let i++
        done
    else
        echo -e "\t\E[41;37m${1} - does not match\E[33;0m"
    fi
    shift
done

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

PS: The captcha is really hard to read. It would be nice it there was an option to generate a new one that could possible be read by a mere human.

Is "(( $# < 2 ))" an

Anonymous's picture

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

That simple?!

Robert de Bock's picture

My god, Bash sure is a great tool! Thanks for the information.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix