Work the Shell - <emphasis>Mad Libs</emphasis> Generator, Part II

 in
Choosing only the interesting words out of a text passage for a Mad Libs-style game proves to be a darn difficult task within a shell script, but Dave's up to the challenge.

Last month, we dug in to creating a Mad Libs generator—a program that you could give a snippet of English prose, and then it would select words randomly and replace them with their parts of speech, so you could have friends or family create their own amusing alternatives.

So, instead of “the quick brown fox jumping over that lazy dog”, it could be “the quick (( adjective )) fox jumps over the (( adjective )) dog”, for example.

The problem is that selecting random words from a sentence also can produce something far more boring, like “(( definite article )) quick brown fox jumps over (( definite article)) lazy dog”.

This month, I take that random word-selection tool and add some smarts so that it is biased toward longer words and words that are nouns or adjectives.

Selecting Words by Length

Last month, you'll recall that our script had a word-selection snippet that looked like this:

while read sentence ; do
  for word in $sentence ; do
    if [ $(( $RANDOM % $density )) -eq 1 ] ; then
      echo "(($word))"
    else
      echo $word
    fi
  done

Where we'll need to expand the code is within the conditional that currently just puts the word in parentheses. The first step is to analyze length: if the word is three or less letters long, we'll be much less likely to select it:

if [ $(( $RANDOM % $density )) -eq 1 ] ; then
  length=$(/bin/echo -n $word | wc -c | sed 's/ //g')
  if [ $length -lt 4 -a $(( $RANDOM % 2 )) -eq 1 ] ; then
    echo \{$word\}    # too short
  else
    echo "(($word))"
  fi
else

This works pretty well—actually, every time a word is selected, its length is checked. Words less than four letters long have a 50% chance of being ignored. With a simple input sample, here's what we get:

{the} ((quick)) brown fox jumped ((over)) the lazy black dog

It's still not great, but at least it recognized that “the” wasn't interesting due to length. I'm still not entirely satisfied with which words it chooses to substitute, but let's move on to the second part of this project, testing part of speech, and come back to the selection criteria later.

Figuring Out the Part of Speech

The core code for this already was presented last month, utilizing Princeton's handy WordNet, so here it is:


pos="$(curl --silent "$dictionary$word" | grep '<h3>' | head -1 \
  | tr '[:upper:]' '[:lower:]' | sed 's/<h3>//;s/<\/h3>//')"
if [ ! -z "$(echo $pos | grep "not return any results")" ] ; then
  echo \[$word\]    # failed to figure out part of speech
else
  echo "((${word}:$pos))"
fi

Notice that we have to worry about failed lookups. Some words just aren't found in the WordNet dictionary, and we need to be prepared. I'll tie these together, as written, and here's what we get as an output:

Note: {} = too short, [] = POS undefined
((I:noun)) {am} {by} ((birth:noun)) {a} Genovese, and
{my} family {is} one of the most ((distinguished:verb))
of that ((republic:noun))

As the header reminds us, at this point, we're denoting words selected but skipped because they're too short with {} and those that have an undefined part of speech with [].

I've also changed the word replacement density factor to have more words tested. As you can see, most of the words in our sample input are now evaluated one way or the other.

Now, let's add a test so that only nouns or adjectives are eligible for substitution too:


if [ ! -z "$(echo $pos | grep "not return")" ] ; then
  echo \[$word\]        # failed to figure POS
else
  if [ -n "$(echo $pos | grep -E '(noun|adjective)')" ] ; then
    echo "((${word}:$pos))"
  else
    echo "<${word}:$pos>"
  fi
fi

I'll give it that same first sentence to Mary Shelley's Frankenstein, and let's see what transpires:


Note: {} = too short, [] = POS undefined, <> = uninteresting POS 
I {am} <by:adverb> birth {a} Genovese, [and] my
family ((is:noun)) {one} {of} {the} ((most:adjective))
<distinguished:verb> {of} [that] ((republic:noun))

We're definitely getting there, but I think we still need to add something to the selection criteria—something that will help us produce more interesting Mad Libs.

But, let's leave that for next month as we've already dug through a lot of code in this column.

Dave Taylor has been hacking shell scripts for a really long time, 30 years. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Good Sharing

mitsubishi klima's picture

thanx good sharing nice post

Thanks for sharing a useful

chiller servsi's picture

Thanks for sharing a useful

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState