Words from Letter Cubes

I got a great letter from a reader with a puzzle to solve, so let's dig in, shall we? Here's what he wrote:

Love your column in Linux Journal. I've read it for years and learned a lot about shell scripting, but not quite enough to solve a puzzle on my own.

My parents have a set of 12 or so wooden blocks on a shelf in their guest room, and each block's face has a letter or symbol printed on it. For years I've arranged the blocks to spell a new message whenever I visit, and I got to thinking that I should be able to program a script to give me all the possible words that can be spelled concurrently with this set of blocks.

I started with making an array for each block where each position holds a letter. My thought was to go through the set of 12 blocks, and for each block, print the six letters (or a space for non-letters) in about the same way I'd loop through a row/column grid, and then compare the results to a dictionary to find valid words. But my solution would handle only one combination of blocks. How do I account for all possible block combinations?

This is an interesting puzzle, partly because of the number of combinations involved. If we start with the math, each block is going to have six sides, which means that there are 612 possible combinations of letters you can create—a total of 2,176,782,336 different possibilities.

Now, let's assume that you say "Yikes!" and decide you want to constrain yourself to only six-letter words. That's just as crazy, because the first block can be any of the 12, then the second one of the remaining 11, and so on. So, 12*11*10*9*8*7 gives us the number of block combinations (which adds up to 665,280 possibilities). But each block can show any of six sides, so it's many times more—really, not much of a reduction at all.

If we could check 1,000 words/second, it'd still take years to go through every combination. So we need to do something smarter than just a brute-force lookup in the dictionary.

Where we really can reduce the number of possibilities is by deciding that each block has an identical set of letters, so that for each slot in the word, there are only six possible options. That means that we have 6*6*6*6*6*6 possibilities for a six-letter word, or 46,656.

That's better! Let's assume that each block is identical and has the letters e, t, a, o, l and n, which turn out to be the six most common letters in the English language, in order, according to Wikipedia.

Note: making all the blocks have identical letters dramatically reduces the complexity of our search, because now order doesn't count, but real wooden-letter blocks vary, so although this may be a good assumption for this column, it's unlikely to match the actual blocks at your parents' house.

To check for four-letter words that could be constructed from these blocks, we now could do a regular-expression search across a fully enumerated English language dictionary:


$ grep -E '^[etaoln][etaoln][etaoln][etaoln]$' dictionary

For testing purposes, the dictionary I'm going to use is the "linuxwords" dictionary available on SourceForge.net: http://sourceforge.net/projects/souptonuts/files/souptonuts/dictionary.

The results are:


alee
aloe
anal
anon
ante
lane
late
lean
lent
loan
lone
loon
loot
neat
neon
none
noon
note
onto
tale
tall
teen
tell
tent
toll
tone
tool

That's easy enough.

Even better, we can reduce the regular-expression notation and make it much easier to check for longer words by using this regex:


^[etaoln]{4}$

The {4} notation indicates that the set has to match four times in a row for it to be valid, the ^ matches the beginning of the line, and $ is the end of the line.

If we want to check six-letter words, it's easy enough:


$ grep -E '^[etaoln]{6}$' dictionary
allele
atonal
latent
nettle
talent
tattoo
tenant
$

Let's go for broke and try longer words too:


$ grep -E '^[etaoln]{7}$' dictionary
antenna
$ grep -E '^[etaoln]{8}$' dictionary
annotate
antennae
neonatal

We didn't find any matches for 9–12 occurrences, so we probably won't be able to use all the blocks in a single, Scrabble-killer word.

The next logical step is to have a map of all letters on all blocks, but then we can see that if "A" is the set of all letters on the first block, "B" on the second, "C" on the third, and so on, that means we need to test for five-letter words against: ABCDE, ABCDF, ABCDG, ABCDH, ABCDI, ..., ABCDL, ABCEDF and so on.

Remember that "A" might be [abcde], B might be [cdefgh], and so on, depending on how the blocks actually are labeled. In short, that's a crazy lot of possibilities, so we're back into something where brute-force simply won't work.

To look at this differently, what if each block had only one letter on every face. Now we're talking Scrabble tiles, where your rack might be the seven letters AEFRGHM. How many words can you make from that?

It turns out that that's a much easier question. One way to figure it out is simply to generate all possible combinations of those letters and test them against the dictionary (a brute-force solution), which would be 7*6*5*4*3*2*1 (7!) possibilities, or about 5000 seven-letter combinations.

For this, we could do a first filter of the dictionary by using the same regular expression:


grep -E '^[aefrghm]{7}$' dictionary

This reveals that there are in fact two words that match these criteria: grammar and referee. But, that's not right! Why?

Because the regular expression doesn't take into account that once a letter is used, it cannot be used a second time. There's no way to spell "referee" with one e, however much we'd like to have it so.

So now what? Well, I did say that the regular expression was just a starting point, right? So the second step is to check the possibilities identified against the actual pool of available letters.

To do that, let's do something that's possibly non-intuitive. Let's sort the letters in each word so that they're in alphabetical order, then sort the pool of available letters in alphabetical order too. For example, "grammar" becomes "aagmmrr".

How? It turns out the code is easy:


$ echo grammar | grep -o . | sort |tr -d "\n"

But, I've run out of space, so stay tuned. We'll come back to this topic next time.

Acknowledgement

Special thanks to my friend Jesse Stay for his help with regular expressions and that last letter-sorting trick.

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a really long time. He's the author of Learning Unix for Mac OS X and Wicked Cool Shell Scripts. You can find him on Twitter as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.

Load Disqus comments