Work the Shell - Making a <emphasis>Mad Libs</emphasis> Generator
My son is at the age when he's decomposing sentences, diagramming them and learning about the parts of speech. Me? I couldn't differentiate between an adverb and an adjective if a wet, smelly red ball smacked me in the head. That's why I have an editor!
There are games for everything, however, and one of the best games for learning the parts of speech is a simple one that's been around since I was a kid: Mad Libs. You know what I'm talking about, it takes simple sentences like: “When my dog is happy, he jumps and barks, his tail wagging a mile a minute.” and transforms them into: “When my [ noun ] is [ adjective ], he [ verb ] and [ verb ], his [ noun ] wagging a mile a [ noun ].”
The question is, can we write a shell script that can perform this sort of transformation? The answer, of course, is yes.
There are two challenges with this project: figuring out which words to replace with their parts of speech and figuring out the part of speech of a given word. Let's tackle these in reverse order.
It turns out that a number of different Web sites let you look up a word and offer its definition and part of speech. The one I use for this exercise is from Princeton, because it's fast, easy to parse and easy to submit queries.
To look up the part of speech of, say, “dog”, the URL to invoke is simply wordnetweb.princeton.edu/perl/webwn?s=dog.
The result highlights the part of speech as an h3 line, so isolating that element is a breeze:
curl --silent "lookup$word" | grep '<h3>'
This particular word demonstrates one of the nuances of the problem: many words have more than one part of speech, demonstrated by the difference between a pet dog and someone who is dogging your every footstep. Sure enough, the result:
<h3>Noun</h3> </ul><h3>Verb</h3>
For simplicity's sake, let's just take the first match, easily done by adding | head -1 to the pipe. Next, let's drop it all into lowercase and strip out the HTML:
| tr '[:upper:]' '[:lower:]' | sed 's/<h3>//;s/<\/h3>//'
Both of these are worth a bit of explanation. You might well have seen tr '[A-Z]' '[a-z]' as the more common way to transliterate uppercase to lowercase, and that works just fine, if you're working in English. Using the character sets “:upper:” and “:lower:” is a more portable alternative that's preferred.
The sed command also lets you specify more than one command argument to apply by simply separating them with a semicolon. What we have here is a substitution of <h3> to a null string (for example, removing it), followed by the same thing for </h3>.
That's all we need to get the part of speech. For example:
$ lookup="http://wordnetweb.princeton.edu/perl/webwn?s=" $ word="happy" $ curl --silent "$lookup$word" | grep '<h3>' | tr '[:upper:]' '[:lower:]' | sed 's/<h3>//;s/<\/h3>//' adjective
And, the hard part's done!
For this article, let's use a replacement density constant to figure out whether any given word should be replaced. The higher the density, the more likely a given word in the input stream will be replaced by its part of speech.
This is lazy and not a great solution, because it can match “is” or “the” just as easily as “dog” or “tail”, but let's go with it for now to get a sense of how it'll all fit together. We'll come back to it and improve the sophistication of the selection criteria later. With me? Good!
For a given word, deciding whether to substitute its part of speech can be calculated as follows, assuming we have a variable called density that has a nonzero integer value:
if [ $(( $RANDOM % $density )) = 1 ] ; then
$RANDOM is one of those cool magic variables in the Bourne shell that has a different value each time you reference it—handy!
Let's put these together and see what we get. We'll use an initial density of 5, which theoretically should mean that if we have a properly random $RANDOM, each word should have a 1:5 chance of being replaced.
The script needs to read the input word by word, testing each word as it goes. This can be done easily with the following loop structure, assuming that the text input comes from stdin:
while read sentence ; do for word in $sentence ; do
Now, we add the random conditional and have a skeleton ready to test:
while read sentence ; do
for word in $sentence ; do
if [ $(( $RANDOM % $density )) -eq 1 ] ; then
echo "(($word))"
else
echo $word
fi
done
done
You can see that at this stage we're going to output the words we're planning on replacing with “(())”. Here's a quick test:
echo this is a test mad-lib input | sh make-madlib.sh this is ((a)) test ((mad-lib)) input
One tiny tweak before I wrap it up for the month—how do we get the words to appear on the same line? It's easy. Remember that each of the code loops is essentially a little script of its own, so this task can be accomplished by adding four characters to the very end of the outermost loop:
done done | fmt
That's all you have to do—add the |fmt after the second done statement. Now when it's run:
echo this is a test mad-lib input | sh make-madlib.sh this is a ((test)) ((mad-lib)) input
Next month, we'll add the part of speech lookup code into the conditional and then spend some time exploring a more sophisticated word choice algorithm. Clearly, random isn't as beneficial.
Dave Taylor has been hacking shell scripts for a really long time, 30 years. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




15 sec ago
1 min 32 sec ago
3 min 47 sec ago
47 min 33 sec ago
50 min 5 sec ago
52 min 15 sec ago
4 hours 4 min ago
5 hours 30 min ago
9 hours 41 min ago
10 hours 26 min ago