Back to Python Examples

Group Anagrams

Teaching hash map key design — how you choose what to hash determines everything.

Contents

  1. The Problem
  2. The Key Insight
  3. Solution 1 — Sort as Key
  4. Solution 2 — Character Count as Key
  5. Summary

1. The Problem

Given a list of strings, group the anagrams together. An anagram is a word formed by rearranging the letters of another.

words = ["eat", "tea", "tan", "ate", "nat", "bat"]

# Answer (order of groups doesn't matter):
# [["eat","tea","ate"], ["tan","nat"], ["bat"]]

2. The Key Insight

Anagrams are the same word with letters rearranged. That means two anagrams always produce the same result when you sort their letters. Use that sorted form as a dictionary key to group them.

sorted("eat")  # → ['a', 'e', 't']
sorted("tea")  # → ['a', 'e', 't']  (same key!)
sorted("ate")  # → ['a', 'e', 't']  (same key!)
sorted("tan")  # → ['a', 'n', 't']  (different group)

3. Solution 1 — Sort Each Word as the Key

from collections import defaultdict

def group_anagrams_sort(words):
    groups = defaultdict(list)

    for word in words:
        key = tuple(sorted(word))  # tuple is hashable; list is not
        groups[key].append(word)

    return list(groups.values())

words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams_sort(words))
# [['eat','tea','ate'], ['tan','nat'], ['bat']]
PropertyValue
TimeO(n · k log k) where n = number of words, k = max word length
SpaceO(n · k) for the dictionary
defaultdict(list) automatically creates an empty list for any new key, so you never need to check if key in groups first.

4. Solution 2 — Character Count as Key

Instead of sorting (O(k log k) per word), count each character (O(k)). Use a tuple of 26 counts as the key — one count per letter of the alphabet.

from collections import defaultdict

def group_anagrams(words):
    groups = defaultdict(list)

    for word in words:
        count = [0] * 26
        for ch in word:
            count[ord(ch) - ord('a')] += 1
        groups[tuple(count)].append(word)

    return list(groups.values())

print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
# [['eat','tea','ate'], ['tan','nat'], ['bat']]
PropertyValue
TimeO(n · k) — no sorting, just counting
SpaceO(n · k) for the dictionary
ord('a') returns the ASCII code 97. Subtracting it gives index 0 for 'a', 1 for 'b', etc. — a clean way to map letters to array positions.

5. Summary

Both solutions work. The character-count key is faster when words are long. The sort key is simpler to explain in an interview.

The broader lesson: the design of your hash key determines correctness and performance. Any two items that should be in the same group must produce identical keys. This thinking applies to many grouping and deduplication problems.