Bash Shell Script: Building a Better March Madness Bracket

on February 9, 2017

Last year, I wrote an article for Linux Journal titled "Building Your March Madness Bracket" My article was timely, arriving just in time for the "March Madness" college basketball series. You see, I don't follow college basketball (or really, any sports at all), but I do like to participate in office pools. And every year, it seems my office likes to fill out the March Madness brackets to see who can best predict the outcomes.

Since I don't follow college basketball, I am not a good judge of which teams might perform better than others. But fortunately, the NCAA ranks the teams for you, so I wrote a Bash script that filled out my March Madness bracket for me. Since teams were ranked 1–16, I used a "D16" method borrowed from tabletop gaming. I thought this was an elegant method to predict the outcomes.

But, there's a bug in my script. Specifically, there's an error in a key assumption for the D16 algorithm, so I'd like to correct that with an improved March Madness script here.

Let's Review What Went Wrong

My Bash script predicted the outcome of a match by comparing the ranking of each team. So, you can throw a D16 "die" to determine if team A wins and another D16 "die" to determine if team B loses, or vice versa. If the two throws agree, you know the outcome of the game: team A wins and team B loses, or team A loses and team B wins.

I asserted that a #1 team should be a strong team, so I assumed the #1 team had 15 out of 16 "chances" to win, and one out of 16 "chances" to lose. Without any other inputs, the #1 ranked team would win if its D16 throw is two or greater, and the #1 team could lose only if the D16 value was one. With that assumption, I wrote this function:


function guesswinner {
  rankA=$1
  rankB=$2

  d16A=$(( ( $RANDOM % 16 ) + 1 ))
  d16B=$(( ( $RANDOM % 16 ) + 1 ))

  if [ $d16A -gt $rankA -a $d16B -le $rankB ] ; then
    # team A wins and team B loses
    return $rankA
  elif [ $d16A -le $rankA -a $d16B -gt $rankB ] ; then
    # team A loses and team B wins
    return $rankB
  else
    # no winner
    return 0
  fi
}

In the guesswinner function, each D16 roll generates a random number 1–16. If the rank of team A is "rankA" and the rank of team B is "rankB," and the D16 roll for team A is "A" and the roll for team B is "B," the function tests two D16 rolls like this:

If A greater than rankA (team A wins) and B less than or equal to rankB (team B loses), then team A wins.
If A less than or equal to rankA (team A loses) and B greater rankB (team B wins), then team B wins.

But look at what happens if team A is ranked #1 and team B is ranked #16. Team A will always win:

A roll 1–16 will have a 15 out of 16 chance to be greater than 1 (team A wins), and a 1–16 roll will always be less than or equal to 16 (team B loses).
A roll 1–16 will have a 1 out of 16 chance to be less than or equal to 1 (team A loses) but a 1–16 roll will never be greater than 16 (team B wins).

There's no scenario in which a rank #16 team B can win over a rank #1 team A. It's a forgone conclusion that in any match of a rank 1 team versus a rank 16 team, the rank 1 team will always win. That's not right. There should be a slim chance for the rank 16 team to win over the rank 1 team.

A Better Algorithm

Instead of a "static" D16 die, we need a custom "die" that has faces relative to the chance of each team to win. Let's consider this simple algorithm to generate a custom die:

Team A gets a=16-rankA+1 sides.
Team B gets b=16-rankB+1 sides.

Under this assumption, a rank 1 team versus a rank 16 team would generate a die with a=16-1+1=16 "team A" sides and b=16-16+1=1 "team B" sides, resulting in a 17-sided die. Similarly, a more even match, such as a rank 8 team versus a rank 9 team, would create a die with a=16-8+1=9 "team A" sides and b=16-9+1=8 "team B" sides, resulting in another 17-sided die.

It's not always a 17-sided die, however. A rank 1 team against a rank 9 team would generate a die with a=16-1+1=16 "team A" sides and b=16-9+1=8 "team B" sides, or a 24-sided die.

In Bash, you can simulate a virtual custom "die" through a file. It's simple enough to generate a file with the correct number of "team A" sides and "team B" sides. If you already have calculated a and b as above, you can write a file like this:


( for teamA in $(seq 0 $a) ; do echo $1 ; done
for teamB in $(seq 0 $b) ; do echo $2 ; done ) > die.file

Picking a random value from this file is as easy as randomizing or "shuffling" the file, then selecting the first line. On Linux systems, you can use the shuf(1) program from GNU coreutils to generate a random permutation of lines from a file. This randomizes whatever data you feed into shuf. Once shuffled, you easily can select the first line of the randomized output using head:


( for teamA in $(seq 0 $a) ; do echo $1 ; done
for teamB in $(seq 0 $b) ; do echo $2 ; done ) | shuf | head -1

That simple expression becomes the heart of the improved March Madness script. It operates the way I want it to: a rank 1 team almost always (but not always) will win over a team 16 team, yet more closely matched games, such as a rank 8 team versus a rank 9 team or a rank 2 team against a rank 3 team, will present more even odds.

Building a Better March Madness Script

The above can be wrapped into a new guesswinner function to predict a contest between two teams, whose ranks are passed as arguments. The function generates the virtual "die" and uses that to guess a winner:


function guesswinner {
  # $1 = team A rank
  # $2 = team B rank

  a=$(( 16 - $1 + 1 ))
  b=$(( 16 - $2 + 1 ))

  win=$( ( for teamA in $(seq 1 $a) ; do echo $1 ; done
  for teamB in $(seq 1 $b) ; do echo $2 ; done ) | shuf | head -1 )

  echo "$1 vs $2 : $win"
  return $win
}

Since the March Madness brackets are always played in order, you can write a playbracket function to run through the different iterations of the bracket. Winners from round one are carried into rounds two and three to select an ultimate winner for the bracket in round four:


function playbracket {
  # $1 = name of bracket

  echo -e "\n___ $1 ___"
  echo -e '\nround 1\n'

  guesswinner 1 16
  round1A=$?

  guesswinner 8 9
  round1B=$?

  guesswinner 5 12
  round1C=$?

  guesswinner 4 13
  round1D=$?

  guesswinner 6 11
  round1E=$?

  guesswinner 3 14
  round1F=$?

  guesswinner 7 10
  round1G=$?

  guesswinner 2 15
  round1H=$?

  echo -e '\nround 2\n'

  guesswinner $round1A $round1B
  round2A=$?

  guesswinner $round1C $round1D
  round2B=$?

  guesswinner $round1E $round1F
  round2C=$?

  guesswinner $round1G $round1H
  round2D=$?

  echo -e '\nround 3\n'

  guesswinner $round2A $round2B
  round3A=$?

  guesswinner $round2C $round2D
  round3B=$?

  echo -e '\nround 4\n'

  guesswinner $round3A $round3B

  return $?
}

Finally, you need only call the playbracket function for each of the four regions. You are left with the "Final Four" with the winners of each bracket, but I'll leave the final determination of those contests for you to resolve on your own:


#!/bin/bash
# improved basketball March Madness prediction

function guesswinner {
    ...
}

function playbracket {
    ...
}

playbracket 'Midwest'
playbracket 'East'
playbracket 'West'
playbracket 'South'

Every time you run the script, you will generate a fresh NCAA March Madness basketball bracket. It's entirely random, so each iteration of the bracket will be different. Here's one sample run:


$ ./basketball2.sh

___ Midwest ___

round 1

1 vs 16 : 1
8 vs 9 : 9
5 vs 12 : 12
4 vs 13 : 4
6 vs 11 : 11
3 vs 14 : 3
7 vs 10 : 7
2 vs 15 : 2

round 2

1 vs 9 : 1
12 vs 4 : 4
11 vs 3 : 3
7 vs 2 : 7

round 3

1 vs 4 : 1
3 vs 7 : 7

round 4

1 vs 7 : 1


___ East ___

round 1

1 vs 16 : 16
8 vs 9 : 9
5 vs 12 : 5
4 vs 13 : 13
6 vs 11 : 6
3 vs 14 : 3
7 vs 10 : 10
2 vs 15 : 2

round 2

16 vs 9 : 9
5 vs 13 : 5
6 vs 3 : 3
10 vs 2 : 2

round 3

9 vs 5 : 5
3 vs 2 : 2

round 4

5 vs 2 : 2


___ West ___

round 1

1 vs 16 : 1
8 vs 9 : 8
5 vs 12 : 5
4 vs 13 : 4
6 vs 11 : 6
3 vs 14 : 3
7 vs 10 : 10
2 vs 15 : 15

round 2

1 vs 8 : 8
5 vs 4 : 5
6 vs 3 : 6
10 vs 15 : 10

round 3

8 vs 5 : 8
6 vs 10 : 10

round 4

8 vs 10 : 8


___ South ___

round 1

1 vs 16 : 1
8 vs 9 : 8
5 vs 12 : 5
4 vs 13 : 4
6 vs 11 : 6
3 vs 14 : 3
7 vs 10 : 7
2 vs 15 : 2

round 2

1 vs 8 : 1
5 vs 4 : 4
6 vs 3 : 6
7 vs 2 : 7

round 3

1 vs 4 : 4
6 vs 7 : 6

round 4

4 vs 6 : 4

In this sample run, my script selects team 1 in the Midwest, team 2 in the East, team 8 in the West, and team 4 in the South. More important, note that the rank 16 team won the first round against the rank 1 team in the East bracket. This could not happen in the script I posted last year. My bug is fixed!

The point of using a script to build your NCAA March Madness basket bracket isn't to take away the fun of the game. On the contrary, since I don't have much familiarity with basketball, building my bracket programmatically allows me to participate in the office basketball pool. It's entertaining without requiring much familiarity with sports statistics. My script gives me a reason to follow the games, but without the emotional investment if my bracket doesn't perform well—and that's good enough for me.

Jim Hall is an open source software advocate and developer, probably best known as the founder of FreeDOS. Jim is also very active in usability testing for open source software projects like GNOME. At work, Jim is CEO of Hallmentum, an IT executive consulting company that helps CIOs and IT Leaders with strategic planning and organizational development.