Remove Duplicates While Preserving Order in List
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Scenario
A data pipeline receives an unordered list of IDs containing duplicates; downstream components require a duplicate-free list while preserving original arrival order.
##### Question
Write a Python function remove_duplicates(lst) that deletes all duplicate elements from a list in place (or returns a new list) while keeping only the first occurrence of each item.
##### Hints
Iterate while maintaining a set of seen elements; avoid O(n^
2) solutions.
Quick Answer: This question evaluates a candidate's competency in data deduplication, handling duplicate elements while preserving original order, and awareness of algorithmic time and space complexity.
Given a list of integer IDs lst, return a new list that contains each value from lst exactly once, preserving the order of first occurrence. Keep only the first time each value appears and discard subsequent duplicates.
Constraints
- 0 <= len(lst) <= 200000
- -10^9 <= lst[i] <= 10^9
- Preserve order of first occurrence
- Aim for O(n) average time and O(n) extra space using a hash set
- Return a new list
Solution
from typing import List
def remove_duplicates(lst: List[int]) -> List[int]:
seen = set()
result: List[int] = []
for x in lst:
if x not in seen:
seen.add(x)
result.append(x)
return result
Explanation
Iterate once through the list, maintaining a set of seen values. For each element, if it has not been seen, add it to the set and append it to the result list. This preserves the order of first occurrence while eliminating duplicates.
Time complexity: O(n). Space complexity: O(n).
Hints
- Use a set to track seen values.
- Append an element to the result only if it has not been seen.
- Avoid repeated membership checks on a list, which lead to O(n^2) time.