Back to Python Examples

Interview Challenge: Find the Duplicate Number

A classic problem with a surprising optimal solution — four approaches from brute force to mind-blowing.

Contents

  1. The Problem
  2. Constraints & Why They Matter
  3. Solution 1 — Brute Force O(n²)
  4. Solution 2 — Sort and Scan O(n log n)
  5. Solution 3 — Hash Set O(n) time, O(n) space
  6. The "Aha!" Moment — Seeing the Cycle
  7. Solution 4 — Floyd's Cycle Detection O(n) time, O(1) space
  8. Step-by-Step Trace
  9. Summary & Comparison

1. The Problem

You are given a list of n + 1 integers. Every integer in the list is between 1 and n inclusive. Exactly one number appears more than once. Find that number.

# Example 1
nums = [1, 3, 4, 2, 2]
# Answer: 2

# Example 2
nums = [3, 1, 3, 4, 2]
# Answer: 3

# Example 3
nums = [1, 1]
# Answer: 1

Simple enough? Here is what makes this question interesting — the constraints:

2. Constraints & Why They Matter

ConstraintWhat it rules out
Time complexity must be better than O(n²)Nested loops (brute force) — too slow
Space complexity must be O(1) — constant extra memoryStoring elements in a hash set or extra array
You must not modify the original listSorting the list in place
Most candidates can reach Solution 3 (hash set) fairly quickly. The real interview challenge is finding a solution that satisfies all three constraints at once. That requires a completely different way of thinking about the problem.

Work through the solutions in order — each one hits a wall against the constraints, which motivates the next approach.

3. Solution 1 — Brute Force

The first instinct: compare every number against every other number. If two match, that is the duplicate.

def find_duplicate_brute(nums):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] == nums[j]:
                return nums[i]

# Test
print(find_duplicate_brute([1, 3, 4, 2, 2]))  # 2
print(find_duplicate_brute([3, 1, 3, 4, 2]))  # 3
PropertyValuePasses?
Time complexityO(n²) — two nested loops❌ Violates the O(n²) rule
Space complexityO(1) — no extra storage
Modifies input?No
The outer loop picks each element once. The inner loop compares it to every element that comes after it. On a list of 10,000 items this is ~50 million comparisons. Too slow.

4. Solution 2 — Sort and Scan

After sorting, the duplicate will always be adjacent to itself. One pass through the sorted list finds it in O(n) time — but sorting costs O(n log n) overall.

def find_duplicate_sort(nums):
    sorted_nums = sorted(nums)  # Creates a new sorted copy — O(n log n)
    for i in range(1, len(sorted_nums)):
        if sorted_nums[i] == sorted_nums[i - 1]:
            return sorted_nums[i]

# Test
print(find_duplicate_sort([1, 3, 4, 2, 2]))  # 2
print(find_duplicate_sort([3, 1, 3, 4, 2]))  # 3
Using sorted() instead of nums.sort() creates a new list and leaves the original untouched — so we avoid the "no modifying the input" problem.
PropertyValuePasses?
Time complexityO(n log n) — dominated by the sort✅ Better than O(n²)
Space complexityO(n) — sorted() creates a copy of the list❌ Violates O(1) space
Modifies input?No (using sorted())
We traded the time problem for a space problem. The sorted copy uses O(n) memory — which rules this out.

5. Solution 3 — Hash Set

Walk through the list once. Keep a set of numbers seen so far. The first number that is already in the set is the duplicate.

def find_duplicate_set(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return num
        seen.add(num)

# Test
print(find_duplicate_set([1, 3, 4, 2, 2]))  # 2
print(find_duplicate_set([3, 1, 3, 4, 2]))  # 3
PropertyValuePasses?
Time complexityO(n) — one pass, set lookup is O(1) average
Space complexityO(n) — the set can hold up to n items❌ Violates O(1) space
Modifies input?No
This is clean, fast, and most people consider it the "good" answer. But we still fail the space constraint. The set grows as the list grows — O(1) space means we need to solve this with a fixed amount of memory, no matter how large the input is.

At this point in a real interview, the follow-up is: "Can you do it in O(1) space?" This is where the problem gets interesting.

6. The "Aha!" Moment — Seeing the Cycle

To crack this, you need to look at the list in a completely different way. Stop thinking of it as a list of values. Start thinking of it as a linked list of directions.

Every index in the list points to the next index to visit — like a chain of arrows. The value at each position tells you where to jump next.

# nums = [1, 3, 4, 2, 2]
# Index:   0  1  2  3  4

# Start at index 0.  nums[0] = 1 → jump to index 1
# At index 1.        nums[1] = 3 → jump to index 3
# At index 3.        nums[3] = 2 → jump to index 2
# At index 2.        nums[2] = 4 → jump to index 4
# At index 4.        nums[4] = 2 → jump to index 2  ← already visited!
# At index 2.        nums[2] = 4 → jump to index 4
# At index 4.        nums[4] = 2 → jump to index 2  ← loop forever

Visualized as a path:

0 → 1 → 3 → 2 → 4
                ↑       ↓
                └───────┘
               cycle entry = 2 = the duplicate!

Why does a cycle always form? Because the duplicate value means two different indices point to the same next index. Like two roads merging into one — from that point on, you are trapped going around the same loop forever.

The duplicate number is always the entry point of the cycle. The cycle entry is the index that gets pointed to by two different positions — which is exactly the duplicate value.

Now the problem is transformed: find the entry point of a cycle in a linked list. That is a solved problem — Floyd's Cycle Detection Algorithm.

7. Solution 4 — Floyd's Cycle Detection

Floyd's algorithm (also called the "tortoise and hare") uses two pointers that move at different speeds through the sequence. The fast pointer moves two steps at a time; the slow pointer moves one step at a time. If there is a cycle, they will eventually meet inside it.

Phase 1: Detect the cycle (find any meeting point inside it)

slow = nums[0]    # tortoise: one step at a time
fast = nums[0]    # hare: two steps at a time

while True:
    slow = nums[slow]           # one step
    fast = nums[nums[fast]]    # two steps
    if slow == fast:
        break                    # they met — we are inside the cycle

Phase 2: Find the cycle entry point (the duplicate)

# Reset one pointer to the start. Move both one step at a time.
# The point where they meet is the cycle entry = the duplicate.
slow = nums[0]    # reset to beginning

while slow != fast:
    slow = nums[slow]
    fast = nums[fast]

return slow    # = fast = the duplicate

Complete function

def find_duplicate(nums):
    # Phase 1: find intersection point inside the cycle
    slow = nums[0]
    fast = nums[0]
    while True:
        slow = nums[slow]
        fast = nums[nums[fast]]
        if slow == fast:
            break

    # Phase 2: find the cycle entry = the duplicate
    slow = nums[0]
    while slow != fast:
        slow = nums[slow]
        fast = nums[fast]

    return slow

# Test all examples
print(find_duplicate([1, 3, 4, 2, 2]))  # 2
print(find_duplicate([3, 1, 3, 4, 2]))  # 3
print(find_duplicate([1, 1]))             # 1
PropertyValuePasses?
Time complexityO(n) — both phases traverse at most n steps
Space complexityO(1) — only two pointer variables
Modifies input?No
All three constraints satisfied. The solution is 15 lines of code using nothing but two integer variables.

8. Step-by-Step Trace

Walking through [1, 3, 4, 2, 2] so you can see exactly what happens:

Phase 1 — finding the meeting point

# nums = [1, 3, 4, 2, 2]
# Index:   0  1  2  3  4

# Start: slow = nums[0] = 1,  fast = nums[0] = 1

# Step 1:
#   slow = nums[slow] = nums[1] = 3
#   fast = nums[nums[fast]] = nums[nums[1]] = nums[3] = 2
#   slow=3, fast=2  — not equal, keep going

# Step 2:
#   slow = nums[3] = 2
#   fast = nums[nums[2]] = nums[4] = 2
#   slow=2, fast=2  — EQUAL, break!

# Meeting point is 2. We are somewhere inside the cycle.

Phase 2 — finding the cycle entry

# Reset slow to start. Keep fast at meeting point (2).
# slow = nums[0] = 1,  fast = 2

# Step 1:
#   slow = nums[1] = 3
#   fast = nums[2] = 4
#   slow=3, fast=4  — not equal

# Step 2:
#   slow = nums[3] = 2
#   fast = nums[4] = 2
#   slow=2, fast=2  — EQUAL, done!

# Return 2. Correct!

Why does Phase 2 always find the entry point?

This is the mathematical magic of Floyd's algorithm. When the two pointers meet inside the cycle at the end of Phase 1, the meeting point has a specific property: the distance from the start of the list to the cycle entry equals the distance from the meeting point to the cycle entry (going forward around the cycle).

So when you reset one pointer to the beginning and advance both one step at a time, they travel equal distances and arrive at the cycle entry simultaneously.

You do not need to memorize this proof for an interview. What matters is understanding why the cycle exists (the duplicate creates two pointers to the same index) and knowing the two-phase structure. If you can explain the intuition, that is impressive enough.

9. Summary & Comparison

SolutionTimeSpaceModifies?All constraints?
Brute force (nested loops)O(n²)O(1)No
Sort and scanO(n log n)O(n)No*
Hash setO(n)O(n)No
Floyd's cycle detectionO(n)O(1)No

* Sort and scan uses sorted() which creates a copy — O(n) space.

What interviewers are really testing

What to say in the interview

# Start by restating the problem and constraints:
# "So we have n+1 values in range 1..n, one duplicate,
#  and we need O(n) time and O(1) space without modifying the list."

# Walk through why each approach fails before jumping to Floyd's.
# Then introduce the cycle idea:
# "If we treat each value as a pointer to the next index,
#  the duplicate creates a cycle — and finding the cycle entry
#  gives us the duplicate."

# Code it in two clear phases with comments.
# After coding, trace through an example by hand.
Even if you blanked on Floyd's, a strong candidate who thoroughly explains brute force, hash set, and articulates why those fail will score well. The process of reasoning matters as much as the answer.