Picking Out the Nouns

A reader wrote a letter to me (oh happy day!), and although I'm still not entirely sure what she's trying to accomplish, it's an interesting puzzle to try to tackle anyway. Here's what she asked:

I do not know how to code, but I have a project in mind that is something like Mad Libs, but is for dream interpretation. I would like for people to be able to type a dream, and then the computer program would pick out the nouns and ask the participants to freely associate anything that comes to mind if they were that object or person. Then, the computer would replace the typed responses back into the typed text for the surreal interpretation. Do you think this would be difficult to create?

Mad Libs for dreams? That's certainly a curious idea, particularly given how seemingly random and disconnected the elements of a dream often seem. Dreams have been seen as both visions from the gods and the playground of our subconscious and its need to resolve our daily experiences. And then there's Freud, who is pretty sure that if you aren't literally dreaming of cigars, it's because you're envious of people with cigars or because you're fixated on cigars but suppressing your interests.

OOohhhhkay then. No cigars, okay? And no Lewinsky jokes either.

What we need to accomplish this task is a script that parses input, identifies and creates a list of nouns, prompts users for their free-association synonyms for each of the nouns, then pushes out the original text again, replacing each original noun with a substitute as suggested by the user. To start, how do you identify nouns?

First, We'll Kill All the Nouns

I was going to grab the comprehensive dictionary from Princeton University's Wordnet program, but closer examination reveals that it has more than 85,000 words and has all sorts of obscure alternative uses and so forth. The end result is that although it's comprehensive, it generates too many false hits. So instead, Desi Quintans has a simple word-only list you can grab for our purpose here: http://www.desiquintans.com/downloads/nounlist.txt.

It's in exactly the format needed too:


$ head nounlist.txt
aardvark
abyssinian
accelerator
accordion
account
accountant
acknowledgment
acoustic
acrylic
act

It seems like that would be the most difficult step, but in fact, it's surprisingly easy given the almost infinite data store of the Internet.

Identifying Nouns in Prose

The next step is rather easy: given some prose, break it down into individual words, then test each word to identify which are nouns. This is really the bulk of the program, now that we have a noun list:


for word in $( sed 's/[[:punct:]]//g' $dream | 
 ↪tr '[A-Z]' '[a-z]' | tr ' ' '\n')
do
  # is the word a noun? Let's look!
  if [ ! -z "$(grep -E "^${word}$" $nounlist)" ] ; then
    nouns="$nouns $word"
  fi
done

The for loop is a bit complicated, but it's removing all punctuation from the input, translating uppercase to lowercase, and then converting each space into a carriage return. The result can be shown most easily by example. Let's say that we had this as input:


I've never seen a blue chipmunk!

Running it through the sed | tr | tr filter produces this:


ive
never
seen
a
blue
chipmunk

That's easy enough, and now that we can separate out each word from the input, it's easy to search the noun list to see if any match. Again, it's a bit complex, because we need to ensure that we aren't getting embedded matches (for example, matching the noun "acoustic" for the slang word "stic").

That's done by rooting the search as a regular expression: ^ is at the beginning of the line, and $ is the end of the line—hence the regular expression ^${word}$ where the use of the optional {} notation just delimits exactly what the variable name is to the shell.

With some debugging code included, here's our first draft of this entire script:


#!/bin/sh

# dreamer - script to help interpret dreams. does this 
#    by asking users to describe their most recent 
#    dream, then prompts them to free associate
#    words for each of the nouns in their original description.

nounlist="nounlist.txt"
dream="/tmp/dreamer.$$"

input=""; nouns=""

trap "/bin/rm -f $dream" 0      # no tempfile left behind

echo "Welcome to Dreamer. To start, please describe in a 
 ↪few sentences the dream"
echo "you'd like to explore. End with "DONE" in all caps 
 ↪on its own line."

until [ "$input" = "DONE" -o "$input" = "done" ]
do
  echo "$input" >> $dream
  read input    # let's read another line from the user...
done

echo ""
echo "Okay. To confirm, your dream was about:"

cat $dream

echo "=============="

for word in $( sed 's/[[:punct:]]//g' $dream | tr '[A-Z]' 
 ↪'[a-z]' | tr ' ' '\n')
do
  # is the word a noun? Let's look!
  if [ ! -z "$(grep -E "^${word}$" $nounlist)" ] ; then
    nouns="$nouns $word"
  fi
done

echo "Hmm.... okay. I have identified the following 
 ↪words as nouns:"
echo "$nouns"

echo "Are you ready to do some free association? Let's begin..."

for word in $nouns
do
  echo "What comes to mind when I say $word?"
done

exit 0

It's really broken into simple functional blocks: first prompting users to share their dream, then breaking down the prose into individual words and comparing them to the noun list and finally (albeit not yet in its final form), prompting for the free association of each identified noun.

Let's run it to see what I mean:


$ sh dreamer.sh
Welcome to Dreamer. To start, please describe in a few 
sentences the dream you'd like to explore. End with DONE 
in all caps on its own line.
I was sitting in a tree house in the middle of an ancient 
forest and an owl was staring at me. It asked "who?" and 
I woke up in a cold sweat.
DONE

Okay. To confirm, your dream was about:

I was sitting in a tree house in the middle of an ancient 
forest and an owl was staring at me. It asked "who?" and 
I woke up in a cold sweat.
==============
Hmm.... okay. I have identified the following words as nouns:
 tree house middle forest owl cold
Are you ready to do some free association? Let's begin...
What comes to mind when I say tree?
What comes to mind when I say house?
What comes to mind when I say middle?
What comes to mind when I say forest?
What comes to mind when I say owl?
What comes to mind when I say cold?

As is immediately obvious, the free association section at the end and the subsequent reassembly of the prose with the new free association words or phrases is still to come.

But that's a project for next month. Meanwhile, keep a dream journal and soon you'll be ready to interpret it thanks to the Linux shell—or something like that!

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.