Interview Challenge: Find the Duplicate Number

A classic problem with a surprising optimal solution — four approaches from brute force to mind-blowing.

The Problem
Constraints & Why They Matter
Solution 1 — Brute Force O(n²)
Solution 2 — Sort and Scan O(n log n)
Solution 3 — Hash Set O(n) time, O(n) space
The "Aha!" Moment — Seeing the Cycle
Solution 4 — Floyd's Cycle Detection O(n) time, O(1) space
Step-by-Step Trace
Summary & Comparison

1. The Problem

You are given a list of n + 1 integers. Every integer in the list is between 1 and n inclusive. Exactly one number appears more than once. Find that number.

# Example 1
nums = [1, 3, 4, 2, 2]
# Answer: 2

# Example 2
nums = [3, 1, 3, 4, 2]
# Answer: 3

# Example 3
nums = [1, 1]
# Answer: 1

Simple enough? Here is what makes this question interesting — the constraints:

2. Constraints & Why They Matter

Constraint	What it rules out
Time complexity must be better than O(n²)	Nested loops (brute force) — too slow
Space complexity must be O(1) — constant extra memory	Storing elements in a hash set or extra array
You must not modify the original list	Sorting the list in place

Most candidates can reach Solution 3 (hash set) fairly quickly. The real interview challenge is finding a solution that satisfies all three constraints at once. That requires a completely different way of thinking about the problem.

Work through the solutions in order — each one hits a wall against the constraints, which motivates the next approach.

3. Solution 1 — Brute Force

The first instinct: compare every number against every other number. If two match, that is the duplicate.

def find_duplicate_brute(nums):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] == nums[j]:
                return nums[i]

# Test
print(find_duplicate_brute([1, 3, 4, 2, 2]))  # 2
print(find_duplicate_brute([3, 1, 3, 4, 2]))  # 3

Property	Value	Passes?
Time complexity	O(n²) — two nested loops	❌ Violates the O(n²) rule
Space complexity	O(1) — no extra storage	✅
Modifies input?	No	✅

The outer loop picks each element once. The inner loop compares it to every element that comes after it. On a list of 10,000 items this is ~50 million comparisons. Too slow.

4. Solution 2 — Sort and Scan

After sorting, the duplicate will always be adjacent to itself. One pass through the sorted list finds it in O(n) time — but sorting costs O(n log n) overall.

def find_duplicate_sort(nums):
    sorted_nums = sorted(nums)  # Creates a new sorted copy — O(n log n)
    for i in range(1, len(sorted_nums)):
        if sorted_nums[i] == sorted_nums[i - 1]:
            return sorted_nums[i]

# Test
print(find_duplicate_sort([1, 3, 4, 2, 2]))  # 2
print(find_duplicate_sort([3, 1, 3, 4, 2]))  # 3

Using sorted() instead of nums.sort() creates a new list and leaves the original untouched — so we avoid the "no modifying the input" problem.

Property	Value	Passes?
Time complexity	O(n log n) — dominated by the sort	✅ Better than O(n²)
Space complexity	O(n) — `sorted()` creates a copy of the list	❌ Violates O(1) space
Modifies input?	No (using `sorted()`)	✅

We traded the time problem for a space problem. The sorted copy uses O(n) memory — which rules this out.

5. Solution 3 — Hash Set

Walk through the list once. Keep a set of numbers seen so far. The first number that is already in the set is the duplicate.

def find_duplicate_set(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return num
        seen.add(num)

# Test
print(find_duplicate_set([1, 3, 4, 2, 2]))  # 2
print(find_duplicate_set([3, 1, 3, 4, 2]))  # 3

Property	Value	Passes?
Time complexity	O(n) — one pass, set lookup is O(1) average	✅
Space complexity	O(n) — the set can hold up to n items	❌ Violates O(1) space
Modifies input?	No	✅

This is clean, fast, and most people consider it the "good" answer. But we still fail the space constraint. The set grows as the list grows — O(1) space means we need to solve this with a fixed amount of memory, no matter how large the input is.

At this point in a real interview, the follow-up is: "Can you do it in O(1) space?" This is where the problem gets interesting.

6. The "Aha!" Moment — Seeing the Cycle

To crack this, you need to look at the list in a completely different way. Stop thinking of it as a list of values. Start thinking of it as a linked list of directions.

Every index in the list points to the next index to visit — like a chain of arrows. The value at each position tells you where to jump next.

# nums = [1, 3, 4, 2, 2]
# Index:   0  1  2  3  4

# Start at index 0.  nums[0] = 1 → jump to index 1
# At index 1.        nums[1] = 3 → jump to index 3
# At index 3.        nums[3] = 2 → jump to index 2
# At index 2.        nums[2] = 4 → jump to index 4
# At index 4.        nums[4] = 2 → jump to index 2  ← already visited!
# At index 2.        nums[2] = 4 → jump to index 4
# At index 4.        nums[4] = 2 → jump to index 2  ← loop forever

Visualized as a path:

0 → 1 → 3 → 2 → 4
                ↑       ↓
                └───────┘
               cycle entry = 2 = the duplicate!

Why does a cycle always form? Because the duplicate value means two different indices point to the same next index. Like two roads merging into one — from that point on, you are trapped going around the same loop forever.

The duplicate number is always the entry point of the cycle. The cycle entry is the index that gets pointed to by two different positions — which is exactly the duplicate value.

Now the problem is transformed: find the entry point of a cycle in a linked list. That is a solved problem — Floyd's Cycle Detection Algorithm.

7. Solution 4 — Floyd's Cycle Detection

Floyd's algorithm (also called the "tortoise and hare") uses two pointers that move at different speeds through the sequence. The fast pointer moves two steps at a time; the slow pointer moves one step at a time. If there is a cycle, they will eventually meet inside it.

Phase 1: Detect the cycle (find any meeting point inside it)

slow = nums[0]    # tortoise: one step at a time
fast = nums[0]    # hare: two steps at a time

while True:
    slow = nums[slow]           # one step
    fast = nums[nums[fast]]    # two steps
    if slow == fast:
        break                    # they met — we are inside the cycle

Phase 2: Find the cycle entry point (the duplicate)

# Reset one pointer to the start. Move both one step at a time.
# The point where they meet is the cycle entry = the duplicate.
slow = nums[0]    # reset to beginning

while slow != fast:
    slow = nums[slow]
    fast = nums[fast]

return slow    # = fast = the duplicate

Complete function

def find_duplicate(nums):
    # Phase 1: find intersection point inside the cycle
    slow = nums[0]
    fast = nums[0]
    while True:
        slow = nums[slow]
        fast = nums[nums[fast]]
        if slow == fast:
            break

    # Phase 2: find the cycle entry = the duplicate
    slow = nums[0]
    while slow != fast:
        slow = nums[slow]
        fast = nums[fast]

    return slow

# Test all examples
print(find_duplicate([1, 3, 4, 2, 2]))  # 2
print(find_duplicate([3, 1, 3, 4, 2]))  # 3
print(find_duplicate([1, 1]))             # 1

Property	Value	Passes?
Time complexity	O(n) — both phases traverse at most n steps	✅
Space complexity	O(1) — only two pointer variables	✅
Modifies input?	No	✅

All three constraints satisfied. The solution is 15 lines of code using nothing but two integer variables.

8. Step-by-Step Trace

Walking through [1, 3, 4, 2, 2] so you can see exactly what happens:

Phase 1 — finding the meeting point

# nums = [1, 3, 4, 2, 2]
# Index:   0  1  2  3  4

# Start: slow = nums[0] = 1,  fast = nums[0] = 1

# Step 1:
#   slow = nums[slow] = nums[1] = 3
#   fast = nums[nums[fast]] = nums[nums[1]] = nums[3] = 2
#   slow=3, fast=2  — not equal, keep going

# Step 2:
#   slow = nums[3] = 2
#   fast = nums[nums[2]] = nums[4] = 2
#   slow=2, fast=2  — EQUAL, break!

# Meeting point is 2. We are somewhere inside the cycle.

Phase 2 — finding the cycle entry

# Reset slow to start. Keep fast at meeting point (2).
# slow = nums[0] = 1,  fast = 2

# Step 1:
#   slow = nums[1] = 3
#   fast = nums[2] = 4
#   slow=3, fast=4  — not equal

# Step 2:
#   slow = nums[3] = 2
#   fast = nums[4] = 2
#   slow=2, fast=2  — EQUAL, done!

# Return 2. Correct!

Why does Phase 2 always find the entry point?

This is the mathematical magic of Floyd's algorithm. When the two pointers meet inside the cycle at the end of Phase 1, the meeting point has a specific property: the distance from the start of the list to the cycle entry equals the distance from the meeting point to the cycle entry (going forward around the cycle).

So when you reset one pointer to the beginning and advance both one step at a time, they travel equal distances and arrive at the cycle entry simultaneously.

You do not need to memorize this proof for an interview. What matters is understanding why the cycle exists (the duplicate creates two pointers to the same index) and knowing the two-phase structure. If you can explain the intuition, that is impressive enough.

9. Summary & Comparison

Solution	Time	Space	Modifies?	All constraints?
Brute force (nested loops)	O(n²)	O(1)	No	❌
Sort and scan	O(n log n)	O(n)	No*	❌
Hash set	O(n)	O(n)	No	❌
Floyd's cycle detection	O(n)	O(1)	No	✅

* Sort and scan uses sorted() which creates a copy — O(n) space.

What interviewers are really testing

Can you recognize that the obvious solutions fail the constraints? — Correctly identifying WHY each approach is insufficient shows analytical depth.
Can you reframe the problem? — The leap from "list of values" to "linked list of pointers" is the key insight. This kind of lateral thinking is what separates good engineers from great ones.
Can you explain your reasoning? — Talking through the cycle visualization and the two-phase approach matters as much as the code itself.

What to say in the interview

# Start by restating the problem and constraints:
# "So we have n+1 values in range 1..n, one duplicate,
#  and we need O(n) time and O(1) space without modifying the list."

# Walk through why each approach fails before jumping to Floyd's.
# Then introduce the cycle idea:
# "If we treat each value as a pointer to the next index,
#  the duplicate creates a cycle — and finding the cycle entry
#  gives us the duplicate."

# Code it in two clear phases with comments.
# After coding, trace through an example by hand.

Even if you blanked on Floyd's, a strong candidate who thoroughly explains brute force, hash set, and articulates why those fail will score well. The process of reasoning matters as much as the answer.

Interview Challenge: Find the Duplicate Number

Contents

1. The Problem

2. Constraints & Why They Matter

3. Solution 1 — Brute Force

4. Solution 2 — Sort and Scan

5. Solution 3 — Hash Set

6. The "Aha!" Moment — Seeing the Cycle

7. Solution 4 — Floyd's Cycle Detection

Phase 1: Detect the cycle (find any meeting point inside it)

Phase 2: Find the cycle entry point (the duplicate)

Complete function

8. Step-by-Step Trace

Phase 1 — finding the meeting point

Phase 2 — finding the cycle entry

Why does Phase 2 always find the entry point?

9. Summary & Comparison

What interviewers are really testing

What to say in the interview