Taken from http://nafiulis.me/the-deceptive-anagram-question.html
Problem
Given a list of words L, find all the anagrams in L in the order in which they appear in L.
Example
The word list:
[“pool”, “loco”, “cool”, “stain”, “satin”, “pretty”, “nice”, “loop”]
The output should be, in exactly this order:
[“pool”, “loco”, “cool”, “stain”, “satin”, “loop”]
Solution
I started reading the above post when I thought I should stop reading and try it myself. I too, started with a very naive approach that was looping over the list in exponential time. Gross.
As I read more, he said the interviewer suggested getting it to O(N) time by using a hash map. I stopped reading once more, and went back to my half-baked problem.
I came up with something that processes the output as it’s reading words in. It’s still rough, but it doesn’t loop back over the list of words, which is nice.
Step 1: Read in the words, remembering their place in the original string
Step 2: Create a hash of the word that will map similar words to the same hash
I figured I’d create a hash of a word that took each letter, and the count of each letter as it appeared in the word, the sort the letters and place the count next to it.
Example: The word “cool” would become “c1l1o2” since “c” and “l” only appear once, and “o” appears twice. That would also map “loco” to the same hash. Nice.
Why did I choose this hash method? It seemed right at the time and I originally thought it would save space. Upon reading more of the original blog post, Nafiul ends up creating a hash of a word by sorting the letters alphabetically, which is much faster/nicer. I’m keeping my hashing function as-is for full-disclosure.
Step 3: As each word is read in, create the hash, and store the word in the set of hashes.
As you can see, when I add the word to the hash map of words, I append it as the word, plus its original place in line. I’ll use that later when constructing the output.
Step 4: Build the output string
Does it work?
Yes, it does. It’s still a hot mess. Problems I see are:
- My “word:index” works, but I’m not in love with it. That value has to be stored somewhere, so alongside the word is probably fine, but, it feels messy. I could just do an
.index_of
lookup on the originalword_list
, but that’s technically a scan of the original word list, which would happen for every word I’m going to output. - My hashing-function is a bit verbose/overkill when sorting the letters of a word would also work. I’m keeping the original here.
Full code
How about a refactor?
Knowing what I know now, and seeing the problems, let’s do some refactoring.
Hashing function
Let’s first simplify the function to do a letter-sort. It now becomes much shorter:
Wow, I’m embarrassed.
Wrap/Unwrap functions
Why didn’t I just use a hash, or an array?
embarassment++
Refactored full code
Now that the code is cleaner/leaner, here’s the full dump:
Yeesh, that looks a lot better. Live, code and learn.