A classic problem with a surprising optimal solution — four approaches from brute force to mind-blowing.
You are given a list of n + 1 integers. Every integer in the list is between 1 and n inclusive. Exactly one number appears more than once. Find that number.
# Example 1
nums = [1, 3, 4, 2, 2]
# Answer: 2
# Example 2
nums = [3, 1, 3, 4, 2]
# Answer: 3
# Example 3
nums = [1, 1]
# Answer: 1
Simple enough? Here is what makes this question interesting — the constraints:
| Constraint | What it rules out |
|---|---|
| Time complexity must be better than O(n²) | Nested loops (brute force) — too slow |
| Space complexity must be O(1) — constant extra memory | Storing elements in a hash set or extra array |
| You must not modify the original list | Sorting the list in place |
Work through the solutions in order — each one hits a wall against the constraints, which motivates the next approach.
The first instinct: compare every number against every other number. If two match, that is the duplicate.
def find_duplicate_brute(nums):
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return nums[i]
# Test
print(find_duplicate_brute([1, 3, 4, 2, 2])) # 2
print(find_duplicate_brute([3, 1, 3, 4, 2])) # 3
| Property | Value | Passes? |
|---|---|---|
| Time complexity | O(n²) — two nested loops | ❌ Violates the O(n²) rule |
| Space complexity | O(1) — no extra storage | ✅ |
| Modifies input? | No | ✅ |
After sorting, the duplicate will always be adjacent to itself. One pass through the sorted list finds it in O(n) time — but sorting costs O(n log n) overall.
def find_duplicate_sort(nums):
sorted_nums = sorted(nums) # Creates a new sorted copy — O(n log n)
for i in range(1, len(sorted_nums)):
if sorted_nums[i] == sorted_nums[i - 1]:
return sorted_nums[i]
# Test
print(find_duplicate_sort([1, 3, 4, 2, 2])) # 2
print(find_duplicate_sort([3, 1, 3, 4, 2])) # 3
sorted() instead of nums.sort() creates a new list and leaves the original untouched — so we avoid the "no modifying the input" problem.| Property | Value | Passes? |
|---|---|---|
| Time complexity | O(n log n) — dominated by the sort | ✅ Better than O(n²) |
| Space complexity | O(n) — sorted() creates a copy of the list | ❌ Violates O(1) space |
| Modifies input? | No (using sorted()) | ✅ |
Walk through the list once. Keep a set of numbers seen so far. The first number that is already in the set is the duplicate.
def find_duplicate_set(nums):
seen = set()
for num in nums:
if num in seen:
return num
seen.add(num)
# Test
print(find_duplicate_set([1, 3, 4, 2, 2])) # 2
print(find_duplicate_set([3, 1, 3, 4, 2])) # 3
| Property | Value | Passes? |
|---|---|---|
| Time complexity | O(n) — one pass, set lookup is O(1) average | ✅ |
| Space complexity | O(n) — the set can hold up to n items | ❌ Violates O(1) space |
| Modifies input? | No | ✅ |
At this point in a real interview, the follow-up is: "Can you do it in O(1) space?" This is where the problem gets interesting.
To crack this, you need to look at the list in a completely different way. Stop thinking of it as a list of values. Start thinking of it as a linked list of directions.
Every index in the list points to the next index to visit — like a chain of arrows. The value at each position tells you where to jump next.
# nums = [1, 3, 4, 2, 2]
# Index: 0 1 2 3 4
# Start at index 0. nums[0] = 1 → jump to index 1
# At index 1. nums[1] = 3 → jump to index 3
# At index 3. nums[3] = 2 → jump to index 2
# At index 2. nums[2] = 4 → jump to index 4
# At index 4. nums[4] = 2 → jump to index 2 ← already visited!
# At index 2. nums[2] = 4 → jump to index 4
# At index 4. nums[4] = 2 → jump to index 2 ← loop forever
Visualized as a path:
0 → 1 → 3 → 2 → 4
↑ ↓
└───────┘
cycle entry = 2 = the duplicate!
Why does a cycle always form? Because the duplicate value means two different indices point to the same next index. Like two roads merging into one — from that point on, you are trapped going around the same loop forever.
Now the problem is transformed: find the entry point of a cycle in a linked list. That is a solved problem — Floyd's Cycle Detection Algorithm.
Floyd's algorithm (also called the "tortoise and hare") uses two pointers that move at different speeds through the sequence. The fast pointer moves two steps at a time; the slow pointer moves one step at a time. If there is a cycle, they will eventually meet inside it.
slow = nums[0] # tortoise: one step at a time
fast = nums[0] # hare: two steps at a time
while True:
slow = nums[slow] # one step
fast = nums[nums[fast]] # two steps
if slow == fast:
break # they met — we are inside the cycle
# Reset one pointer to the start. Move both one step at a time.
# The point where they meet is the cycle entry = the duplicate.
slow = nums[0] # reset to beginning
while slow != fast:
slow = nums[slow]
fast = nums[fast]
return slow # = fast = the duplicate
def find_duplicate(nums):
# Phase 1: find intersection point inside the cycle
slow = nums[0]
fast = nums[0]
while True:
slow = nums[slow]
fast = nums[nums[fast]]
if slow == fast:
break
# Phase 2: find the cycle entry = the duplicate
slow = nums[0]
while slow != fast:
slow = nums[slow]
fast = nums[fast]
return slow
# Test all examples
print(find_duplicate([1, 3, 4, 2, 2])) # 2
print(find_duplicate([3, 1, 3, 4, 2])) # 3
print(find_duplicate([1, 1])) # 1
| Property | Value | Passes? |
|---|---|---|
| Time complexity | O(n) — both phases traverse at most n steps | ✅ |
| Space complexity | O(1) — only two pointer variables | ✅ |
| Modifies input? | No | ✅ |
Walking through [1, 3, 4, 2, 2] so you can see exactly what happens:
# nums = [1, 3, 4, 2, 2]
# Index: 0 1 2 3 4
# Start: slow = nums[0] = 1, fast = nums[0] = 1
# Step 1:
# slow = nums[slow] = nums[1] = 3
# fast = nums[nums[fast]] = nums[nums[1]] = nums[3] = 2
# slow=3, fast=2 — not equal, keep going
# Step 2:
# slow = nums[3] = 2
# fast = nums[nums[2]] = nums[4] = 2
# slow=2, fast=2 — EQUAL, break!
# Meeting point is 2. We are somewhere inside the cycle.
# Reset slow to start. Keep fast at meeting point (2).
# slow = nums[0] = 1, fast = 2
# Step 1:
# slow = nums[1] = 3
# fast = nums[2] = 4
# slow=3, fast=4 — not equal
# Step 2:
# slow = nums[3] = 2
# fast = nums[4] = 2
# slow=2, fast=2 — EQUAL, done!
# Return 2. Correct!
This is the mathematical magic of Floyd's algorithm. When the two pointers meet inside the cycle at the end of Phase 1, the meeting point has a specific property: the distance from the start of the list to the cycle entry equals the distance from the meeting point to the cycle entry (going forward around the cycle).
So when you reset one pointer to the beginning and advance both one step at a time, they travel equal distances and arrive at the cycle entry simultaneously.
| Solution | Time | Space | Modifies? | All constraints? |
|---|---|---|---|---|
| Brute force (nested loops) | O(n²) | O(1) | No | ❌ |
| Sort and scan | O(n log n) | O(n) | No* | ❌ |
| Hash set | O(n) | O(n) | No | ❌ |
| Floyd's cycle detection | O(n) | O(1) | No | ✅ |
* Sort and scan uses sorted() which creates a copy — O(n) space.
# Start by restating the problem and constraints:
# "So we have n+1 values in range 1..n, one duplicate,
# and we need O(n) time and O(1) space without modifying the list."
# Walk through why each approach fails before jumping to Floyd's.
# Then introduce the cycle idea:
# "If we treat each value as a pointer to the next index,
# the duplicate creates a cycle — and finding the cycle entry
# gives us the duplicate."
# Code it in two clear phases with comments.
# After coding, trace through an example by hand.