The Anagram Problem

Taken from http://nafiulis.me/the-deceptive-anagram-question.html

Problem

Given a list of words L, find all the anagrams in L in the order in which they appear in L.

Example

The word list:

[“pool”, “loco”, “cool”, “stain”, “satin”, “pretty”, “nice”, “loop”]

The output should be, in exactly this order:

[“pool”, “loco”, “cool”, “stain”, “satin”, “loop”]

Solution

I started reading the above post when I thought I should stop reading and try it myself. I too, started with a very naive approach that was looping over the list in exponential time. Gross.

As I read more, he said the interviewer suggested getting it to O(N) time by using a hash map. I stopped reading once more, and went back to my half-baked problem.

I came up with something that processes the output as it’s reading words in. It’s still rough, but it doesn’t loop back over the list of words, which is nice.

Step 1: Read in the words, remembering their place in the original string

Step 2: Create a hash of the word that will map similar words to the same hash

I figured I’d create a hash of a word that took each letter, and the count of each letter as it appeared in the word, the sort the letters and place the count next to it.

Example: The word “cool” would become “c1l1o2” since “c” and “l” only appear once, and “o” appears twice. That would also map “loco” to the same hash. Nice.

Why did I choose this hash method? It seemed right at the time and I originally thought it would save space. Upon reading more of the original blog post, Nafiul ends up creating a hash of a word by sorting the letters alphabetically, which is much faster/nicer. I’m keeping my hashing function as-is for full-disclosure.

Step 3: As each word is read in, create the hash, and store the word in the set of hashes.

As you can see, when I add the word to the hash map of words, I append it as the word, plus its original place in line. I’ll use that later when constructing the output.

Step 4: Build the output string

Does it work?

Yes, it does. It’s still a hot mess. Problems I see are:

My “word:index” works, but I’m not in love with it. That value has to be stored somewhere, so alongside the word is probably fine, but, it feels messy. I could just do an .index_of lookup on the original word_list, but that’s technically a scan of the original word list, which would happen for every word I’m going to output.
My hashing-function is a bit verbose/overkill when sorting the letters of a word would also work. I’m keeping the original here.

Full code

How about a refactor?

Knowing what I know now, and seeing the problems, let’s do some refactoring.

Hashing function

Let’s first simplify the function to do a letter-sort. It now becomes much shorter:

Wow, I’m embarrassed.

Wrap/Unwrap functions

Why didn’t I just use a hash, or an array?

embarassment++

Refactored full code

Now that the code is cleaner/leaner, here’s the full dump:

Yeesh, that looks a lot better. Live, code and learn.