Become Queen Bee for a Day Using Python's Built-in Data Types

Python

Cheaters never win, but at least they can use Python.

Like many other nerds, I love word puzzles. I'm not always great at them, and I don't always have time to do them, but when I do, I really enjoy them.

I recently discovered a new daily puzzle, known as "spelling bee", that the New York Times offers online. The idea is simple. There are seven different letters, one in the center of a circle and six around it. Your job is to make as many different words as you can from those seven letters. Each word must be at least four letters long, and each word also must contain the center letter. You can use each letter as many times as you want.

So if the letters are "eoncylt", with a center letter of "y", some of the words you could create might be "cyclone", "eyelet" and "nylon".

The online game gives you a score based on how many words you've made from the potential pool. If you get them all, you're awarded "queen bee" status.

I do pretty well at this puzzle, but I've never managed to find all of the hidden words. Nevertheless, I have become queen bee on a few occasions. How? The answer is simple. I cheated. How? Using Python, of course.

Now, cheating at games isn't necessarily the first order of business when it comes to programming. And cheating at word games in which you're competing against yourself is probably a sign of unhealthy competition. But, doing so also provides a great way to review some of the ways you can use Python's built-in data types and the ease with which you can process words and text.

So in this article, I explore a number of ways you can cheat—and yes, become the queen bee, if only for a day.

Trying All Combinations

To start, you simply might try to form all of the possible combinations you can with the letters you're given. As you might remember from high-school math class, there's a difference between "permutations" and "combinations". When you generate "permutations", the order is important, but when you generate "combinations", the order is not important.

You easily can see this using Python's itertools module, a part of the standard library that has functions named permutations and combinations. Each takes both an iterable data structure and the number of items you want in each resulting list. For example:


>>> list(itertools.combinations(['a', 'b', 'c', 'd'], 2))
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'),
 ↪('c', 'd')]

>>> list(itertools.permutations(['a', 'b', 'c', 'd'], 2))
[('a', 'b'),
 ('a', 'c'),
 ('a', 'd'),
 ('b', 'a'),
 ('b', 'c'),
 ('b', 'd'),
 ('c', 'a'),
 ('c', 'b'),
 ('c', 'd'),
 ('d', 'a'),
 ('d', 'b'),
 ('d', 'c')]

As you can see, the output from combinations saw the order as unimportant and thus returned just ('a', 'b'). But permutations saw the order as important, and thus returned both ('a', 'b') and ('b', 'a').

You could use this to generate all of the letter combinations for possible words and then sift through that, right? Well, not really, for two reasons. First, the game allows you to repeat letters. And second, these functions let you specify a only single number of outputs.

You can solve the first problem by using the combinations_with_replacement function, which not only has a long name, but (as the name states) also allows letters to appear more than once in the output. For example:


>>>  list(itertools.combinations_with_replacement(['a', 'b',
 ↪'c', 'd'], 2))
[('a', 'a'),
 ('a', 'b'),
 ('a', 'c'),
 ('a', 'd'),
 ('b', 'b'),
 ('b', 'c'),
 ('b', 'd'),
 ('c', 'c'),
 ('c', 'd'),
 ('d', 'd')]

However, you want to find words that are at least four letters long. Let's assume that the longest possible word would be 12 letters long. You could use a for loop, appending the results from each iteration to a list. But an even more Pythonic way to do this would be to use a list comprehension—or even better, a nested list comprehension. Comprehensions are, in my experience, one of the hardest concepts for new Python developers to use. However, they are perfect for creating and transforming sequences, which is precisely what's happening here:


>>> one_combination
    for n in range(4, 13)
    for one_combination in
       itertools.combinations_with_replacement('abc', n)]

This code iterates over the range from 4 to 13, thus producing the integers from 4 through 12. For each of these values of n, you then produce all of the combinations, with replacement, of that length and with the letters "a", "b" and "c". The results then are output as a list.

This is great, but it's missing at least two things. First, you aren't interested in combinations, so much as words. And there aren't just a few letters for which you're searching, but seven letters—one of which must appear in the word.

So let's beef up the comprehension a bit in order to get all of the words:


>>> all_letters = 'eoncylt'
>>> center_letter = 'y'
>>> [''.join(one_combination)
    for n in range(4, 13)
    for one_combination in
       itertools.combinations_with_replacement(all_letters, n)
       if center_letter in one_combination]

The good news is that this now indeed has generated all of the possible combinations that might give you the answer. The bad news is that it's also generated a lot of combinations that aren't really words. Indeed, according to my count, this created 31,788 "words", including such greats as "occccccylt" and "eeeyt".

So yes, you could become the queen bee by going through and entering all of these words, one by one, as input into the game. Somehow though, I think entering 31,788 words takes the fun out of cheating.

Enter the Dictionary

To make cheating more efficient and fun, let's try a different strategy. Instead of generating all of the possible combinations of letters, let's instead search through only those combinations that are correct. How can you know what's correct? Via the dictionary, of course—and the fact that Linux comes with an English-language dictionary makes this easier.

Indeed, although it often can be useful to generate combinations, it's probably a wiser strategy just to start with the dictionary and choose the words that fit your needs.

The dictionary that I'm using for this example is in /usr/share/dict/american-english, and it contains 102,401 different words, each on a line by itself. The fact that each word is on a separate line turns out to be a great advantage, because it means that you can (once again) create a list comprehension. In this case, the source of the list comprehension won't be an iterator from itertools, but rather the dictionary file itself. Iterating over a file in Python gives you one line per iteration. Here's how you can get these words:


[one_word.strip()
    for one_word in open(words_file)]

But, wait a moment. This will return all of the words. You're interested only in those words that contain the seven letters in the puzzle, as well as the center letter.

One solution to this would be to write a function that checks whether the word fits your needs. For example:


>>>  def is_legal(word, all_letters, center_letter):
        word = word.strip()

        if len(word) < 4:
            return False

        if center_letter not in word:
            return False

        for one_letter in word:
            if one_letter not in all_letters:
                return False

        return True

The function would appear to work too:


>>> is_legal('cy', all_letters, center_letter)
    False

>>> is_legal('cycle', all_letters, center_letter)
    True

>>> is_legal('hairbrush', all_letters, center_letter)
    False

Armed with this function, you now can read through the dictionary and get the words that fit:


[one_word.strip()
    for one_word in open(words_file)
   if is_legal(one_word, all_letters, center_letter)]

Note the use of the strip method to remove leading and trailing whitespace from the word, mostly because of the newline that you'll get from each word in the file. For this reason, you'll also use strip in the is_legal function to ensure that you don't have to deal with the newline.

So, does it work? The answer is yes, mostly. On some days, my Python program finds words that weren't in the game. And on other days, the game is looking for words that aren't in the Linux dictionary. But for the most part, everything seems to work well, and although I try hard to win the game each day, there are definitely some days when I give up, run my program and feel smug at my programming skills, if not my word-game skills.

Conclusion

Whoever said that "cheaters never win" didn't take into consideration that cheating might lead to a better understanding of programming and data structures. And indeed, I often tell people that the key to good programming is often not knowing the best algorithms, but rather knowing which libraries to apply and how to combine the strengths of the language you're using, so that you can do as little work as possible. This article shows how to tackle a simple real-world problem that involved very little of your own code. Rather, understanding the problem, Python's standard library and its data structures combined to give the answers.

Reuven Lerner teaches Python, data science and Git to companies around the world. You can subscribe to his free, weekly "better developers" e-mail list, and learn from his books and courses at http://lerner.co.il. Reuven lives with his wife and children in Modi'in, Israel.

Load Disqus comments

Storix backup and disaster recovery

 

Corporate Patron

Pulseway Logo

Limited Time Offer

September Cover

 

Take Linux Journal for a test drive. Download our September issue for FREE.