I’m busy working my way through Think Python and doing the excercises when I found an especially interesting one:
Exercise 12-6: Another ‘Car Talk Puzzler’:
What is the longest English word, that remains a valid English word, as you remove its letters one at a time? Now, letters can be removed from either end, or the middle, but you can’t rearrange any of the letters. Every time you drop a letter, you wind up with another English word. If you do that, you’re eventually going to wind up with one letter and that too is going to be an English word—one that’s found in the dictionary. I want to know what’s the longest word and how many letters does it have? I’m going to give you a little modest example: Sprite. Ok? You start off with sprite, you take a letter off, one from the interior of the word, take the r away, and we’re left with the word spite, then we take the e off the end, we’re left with spit, we take the s off, we’re left with pit, it, and I.
The answer, found on the Think Python site, consists of creating a dictionary, and then for every word in it:
- Generate a list of all the words that can be made from it by removing one letter.
- Check if each of these entries exists, and if so recursively check that word.
The end result is a list ending in a single letter word, and success. Finally, the largest list of all the words is passed out for display.
Using a dictionary, the result is quite fast – on my computer, it took about 2.2 seconds to read in and process 275k entries (which consisted of the Scrabble SOWPODS ‘official’ words list). But of course, I was wondering if it could be optimized.
Several changes came to mind:
- Since we want the largest entries, we ignore smaller ones as we find matches. For example, if a five letter word is a winner, then we only test words greater than five letters in the future.
- We know that eventually the one word answer has to be either ‘A’ or ‘I’, the only two single-letter words in English. For this reason, we can skip words not containing those letters. In our test file that means ignoring about 16% of the words – not much, but it adds up.
- The routine is simpler – scanWord(array) is called on each entry in the dictionary (turned into a single entry array). The function takes that entry and creates a list of words which are missing one letter from it, and checks them against the dictionary (for example, passing in ‘DOG’ would create ‘OG’, ‘DG’, and ‘DO’). If any match, it’s added to the array, which is then passed recursively to scanWord(). If not, it’s on to the next entry. And of course, if none match, an empty array is passed back, the flag for failure (and a full array passed back is success).
Here’s my optimized code:
def make_word_dict(): # this function copied from 'Think Python' exercise 12-6 that contains the words as keys.""" d = dict() fin = open("scrabble-sowpods.txt") for line in fin: word = line.strip().lower() d[word] = word for letter in ['a', 'i', '']: d[letter] = letter return d def scanWord(a): w=a[len(a)-1] i=len(w) for k in range(0,i): x=w[:k]+w[k+1:] if (x in d): r=list(a) r.append(x) if (1==len(x)): return r r=scanWord(r) if (len(r)>0): return r return  d=make_word_dict() matchLen=-1 matchList= for w in d: if (len(w)>matchLen): if ( 'a' in w or 'i' in w ): a=[w] l=scanWord(a) if (len(l)>0): matchLen=len(w) matchList=l s=','.join(matchList)+'n' print s
Using this the code the timing was about 0.88 sec versus 2.2 for the other one. However, since both loaded in the word list first (make_word_dict() – about 0.2 sec), the result was really 0.68 vs 2.0 on the processing – about 3X speed increase.
And the winner? An 11 character entry:
austringers, astringers, stringers, stingers, singers, singes, sines, sies, sis, is, i
Of course, it wasn’t the only one – here’s the other 11 character entries found:
breastfeeds, breastfeed, breastfed, breasted, reasted, easted, eased, ease, eas, as, a
carabineers, carabiners, carabines, carbines, carbies, caries, cries, cris, cis, is, i
carabineros, carabiners, carabines, carbines, carbies, caries, cries, cris, cis, is, i
carabiniers, carabiners, carabines, carbines, carbies, caries, cries, cris, cis, is, i
complecting, completing, competing, compting, comping, coping, oping, ping, pig, pi, i
housellings, houselling, houseling, housling, housing, hosing, hoing, hing, hin, in, i
scratchiest, scratchies, scratches, cratches, ratches, raches, aches, aces, ace, ae, a
streamlings, streamings, steamings, teamings, tamings, tamins, amins, ains, ins, is, i
unpreaching, upreaching, preaching, peaching, peching, eching, ehing, hing, hin, in, i
I’m starting to like Python…