Bash Regular Expressions
When working with regular expressions in a shell script the norm is to use grep or sed or some other external command/program. Since version 3 of bash (released in 2004) there is another option: bash's built-in regular expression comparison operator "=~".
Bash's regular expression comparison operator takes a string on the left and an extended regular expression on the right. It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).
In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..
The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:
#!/bin.bash
if [[ $# -lt 2 ]]; then
echo "Usage: $0 PATTERN STRINGS..."
exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo
while [[ $1 ]]
do
if [[ $1 =~ $regex ]]; then
echo "$1 matches"
i=1
n=${#BASH_REMATCH[*]}
while [[ $i -lt $n ]]
do
echo " capture[$i]: ${BASH_REMATCH[$i]}"
let i++
done
else
echo "$1 does not match"
fi
shift
done
Assuming the script is saved in "bashre.sh", the following sample shows its output:
# sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
regex: aa(b{2,3}[xyz])cc
aabbxcc matches
capture[1]: bbx
aabbcc does not match
Mitch Frazier is an Associate Editor for Linux Journal.
Trending Topics
| You Need A Budget | Feb 10, 2012 |
| The Linux powered LAN Gaming House | Feb 08, 2012 |
| Creating a vDSO: the Colonel's Other Chicken | Feb 06, 2012 |
| Your CMS Is Not Your Web Site | Feb 01, 2012 |
| Casper, the Friendly (and Persistent) Ghost | Jan 31, 2012 |
| Razor-qt 0.4 - Qt based Desktop Environment | Jan 30, 2012 |
- Fun with ethtool
- Parallel Programming with NVIDIA CUDA
- 100% disappointed with the decision to go all digital.
- Readers' Choice Awards 2011
- Linux-Based X Terminals with XDMCP
- Validate an E-Mail Address with PHP, the Right Way
- You Need A Budget
- The Linux powered LAN Gaming House
- Why Python?
- Python for Android
- BeOS was the best
2 hours 35 min ago - I use Wireshark on a daily
7 hours 6 min ago - buena información
12 hours 13 min ago - One important "bucket" that I didn't note (désolé si qqun deja d
13 hours 13 min ago - Gnome3 is such a POS. No one
22 hours 40 min ago - Gnome 3 is the biggest POS
22 hours 51 min ago - I didn't knew this thing by
1 day 4 hours ago - Author's reply
1 day 8 hours ago - Link to modlys
1 day 9 hours ago - I use YNAB because of the
1 day 9 hours ago





Comments
bash-regular-expressions
if [[ $1 =~ $regex ]]; then
this is not working in Fedora Core 10, does anybody help ?
More Details
What command line did you use to invoke the script? What version of bash did you use (bash --version)?
Mitch Frazier is an Associate Editor for Linux Journal.
\w is a word doesnt work in
\w is a word doesnt work in bash regexp ?
\w is not supposed to "work"
\w is a perl regex atom. bash uses POSIX extended regex instead.
man 7 regex for details
bash-regular-expressions
I wrote the suggested bash script and got the demonstrated result. However when I invoked:
sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc aabbyccaabbbzcc
I expected to get two matches with the last parameter but I only got one. I'm surprised
that I'm the only one so what did I do wrong?
regex: aa(b{2,3}[xyz])cc
aabbxcc matches
capture[1]: bbx
aabbcc does not match
aabbyccaabbbzcc matches
capture[1]: bby
Substring matches
Your regular expression has only a single pair of parentheses, so you will not have more than one substring saved in BASH_REMATCH. If you had two pairs of parentheses in your regular expression, you could have two substrings saved in BASH_REMATCH, etc. Here's an example with three pairs of parentheses:
$ sh bashre.sh '((aa(b{2,3}[xyz])cc)+)' aabbxcc aabbcc aabbyccaabbbzcc regex: ((aa(b{2,3}[xyz])cc)+) aabbxcc matches capture[1]: aabbxcc capture[2]: aabbxcc capture[3]: bbx aabbcc does not match aabbyccaabbbzcc matches capture[1]: aabbyccaabbbzcc capture[2]: aabbbzcc capture[3]: bbbzBash doesn't appear to have a "global" pattern matching switch a la perl's //g option.
add color and better indentation to the output
#!/bin/sh if [[ $# -lt 2 ]]; then echo "Usage: regex PATTERN STRINGS..." exit 1 fi regex=$1 shift echo "regex: $regex" echo while [[ $1 ]] do if [[ $1 =~ $regex ]]; then echo -e "\t\E[42;37m${1} - matches\E[33;0m" i=1 n=${#BASH_REMATCH[*]} while [[ $i -lt $n ]] do echo -e "\t\t\E[43;37mcapture[$i]: ${BASH_REMATCH[$i]}\E[33;0m" let i++ done else echo -e "\t\E[41;37m${1} - does not match\E[33;0m" fi shift doneIs "(( $# < 2 ))" an
Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?
Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).
Thank you.
PS: The captcha is really hard to read. It would be nice it there was an option to generate a new one that could possible be read by a mere human.
Is "(( $# < 2 ))" an
Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?
Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).
Thank you.
That simple?!
My god, Bash sure is a great tool! Thanks for the information.