Finishing Up the Content Spinner

You'll recall that in my last article I shared a long, complex explanation for why spam email catches my attention and intrigues me, perhaps more than it should. Part of it is that I've been involved in email forever—I even wrote one of the most popular old-school email programs back in the day. But, there's also just the puzzle factor of taking a massive data set of millions of records and trying to produce "personalized" messages on such a large scale.

The easy version of this is to have named data fields like ${firstname}, so you can open your email with "Dear ${firstname}, I heard you went to ${college}? Me too!" and so on.

But, I'm more interested in the "spinning" side of things—the production of prose that has built-in synonyms, as exemplified by:


The {idea|concept|inspiration} is that each time you'd use a
{word|phrase} you instead list a set of {similar words|synonyms|
alternative words} and the software automatically picks one
{randomly|at random} and is done.

I know, you're likely shaking your head and wondering "what the deuce happened to Dave?", but humor me, let's explore this together as a text-processing puzzle.

In my June 2016 column, I presented the core building blocks of the article spinner, a script that could identify the {} surrounded choices, isolate them, count how many options were present and display it to the user as debugging output.

So, the above would be displayed as:


$ sh spinner.sh spinme.txt
The
3 options, spinning --- idea|concept|inspiration
is that each time you'd use a
2 options, spinning --- word|phrase
you instead list a set of
3 options, spinning --- similar words|synonyms|alternative words
and the software automatically picks one
2 options, spinning --- randomly|at random
and is done.

That's a good start, but this time, let's finish the job and actually pick randomly from the set of choices each time, output only the selected option and reflow the text to make it all look good.

______________________

Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.