Teaching hash map key design — how you choose what to hash determines everything.
Given a list of strings, group the anagrams together. An anagram is a word formed by rearranging the letters of another.
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
# Answer (order of groups doesn't matter):
# [["eat","tea","ate"], ["tan","nat"], ["bat"]]
Anagrams are the same word with letters rearranged. That means two anagrams always produce the same result when you sort their letters. Use that sorted form as a dictionary key to group them.
sorted("eat") # → ['a', 'e', 't']
sorted("tea") # → ['a', 'e', 't'] (same key!)
sorted("ate") # → ['a', 'e', 't'] (same key!)
sorted("tan") # → ['a', 'n', 't'] (different group)
from collections import defaultdict
def group_anagrams_sort(words):
groups = defaultdict(list)
for word in words:
key = tuple(sorted(word)) # tuple is hashable; list is not
groups[key].append(word)
return list(groups.values())
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams_sort(words))
# [['eat','tea','ate'], ['tan','nat'], ['bat']]
| Property | Value |
|---|---|
| Time | O(n · k log k) where n = number of words, k = max word length |
| Space | O(n · k) for the dictionary |
defaultdict(list) automatically creates an empty list for any new key, so you never need to check if key in groups first.Instead of sorting (O(k log k) per word), count each character (O(k)). Use a tuple of 26 counts as the key — one count per letter of the alphabet.
from collections import defaultdict
def group_anagrams(words):
groups = defaultdict(list)
for word in words:
count = [0] * 26
for ch in word:
count[ord(ch) - ord('a')] += 1
groups[tuple(count)].append(word)
return list(groups.values())
print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
# [['eat','tea','ate'], ['tan','nat'], ['bat']]
| Property | Value |
|---|---|
| Time | O(n · k) — no sorting, just counting |
| Space | O(n · k) for the dictionary |
ord('a') returns the ASCII code 97. Subtracting it gives index 0 for 'a', 1 for 'b', etc. — a clean way to map letters to array positions.Both solutions work. The character-count key is faster when words are long. The sort key is simpler to explain in an interview.