Spinning and Text Processing

I have a dirty secret to share, and I hope you won't think less of me once you learn it. I used to be in the internet marketing world and pitched my coaching programs and DVD sets from stages around the United States. Yes, for $999, I'd teach you how to make money online, and if you were one of the first three to sign up, I'd even throw in my friend's dynamite ebook absolutely free!

Truth is, I didn't last long in that space because I'm much more of a do-er than a salesperson, and it would bug me to no end when people would buy my coaching package—at 20% off, but only if you sign up right now!—and then never actually open it and use it to at least try their hand at creating an online business.

That's all in the past, fortunately, but I've retained an interest in those business opportunity pitches and what they're actually selling. Just like the cliché envelope-stuffing job (you know: "Send me $200 in an envelope, and I'll show you how to ask people to send you money!"), it turns out that a lot of online businesses still are predicated on gaming search engines to gain traffic to pages selling daft and usually worthless things.

And, one way that these entrepreneurs game Google and other search engines is by "spinning" to produce lots and lots of content from a single article that they've paid someone a few bucks to write in the first place.

It's all rather uninspiring, except the spinning idea itself is rather interesting, and I've been toying with writing a shell script to allow easy article spinning for quite a long time. There are more prosaic, less questionable uses for this technology too, like in programs or even games that have text messages useful to vary.

The {idea|concept|inspiration} is that each time you'd use a {word|phrase} you instead list a set of {similar words|synonyms|alternative words} and the software automatically picks one {randomly|at random}.

So the previous sentence would come out of the spinner as "The idea is that each time you'd use a phrase you instead list a set of alternative words and the software automatically picks one at random." Got it? Easy enough.

A more advanced spinner might actually tap a thesaurus, and each time it sees a word, push out a set of synonyms automatically, which the other script then randomly simplifies each time it's invoked.

In fact, go read spam blog comments or spam email, and you'll see the output of these sort of contextless sentence manipulations. They can be...weird, like this:

she's got arriving in can easily dresses, still Beth may be 36 yr old men's city servant, outdoors of waking time 'en femme'. she's single, symmetrical in addition thinks to achieve marital, "Eventually..."

But hey, just because there are bad uses, doesn't mean it's not an interesting project to try to code, right? I trust you to exercise good judgment of your own when you explore this script, okay?

Spinning Out the Spinner

The basic tasks of the script are straightforward: parse the input, isolate each word-choice block, pick one at random, then reassemble everything and display it.

To make things a bit easier, I'm going to start by using fmt to make each paragraph one really long line. That way, I then can break the input into lines that don't have a word-choice block and those that do:

fmt -w$bigwidth "$1" | tr '{' '\n' | tr '}' '\n'

An input line like {this|demo} would then transform.

An input line like
would then transform.

See how that works? I'm going to use fmt again at the end of the process to clean up the output.


Dave Taylor has been hacking shell scripts for over thirty years. Really. He's the author of the popular "Wicked Cool Shell Scripts" and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.