Deduplicate an Array in Linear Time
Company: Okta
Role: Frontend Engineer
Category: Coding & Algorithms
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to implement linear-time deduplication of primitive arrays, understand hash-based membership tracking, and preserve first-occurrence ordering without mutating the input.
Part 1: Deduplicate an Integer Array in Linear Time
Constraints
- 0 <= len(arr) <= 200000
- -10^9 <= arr[i] <= 10^9
- Do not mutate arr
- Target time complexity: O(n)
Examples
Input: ([3, 1, 3, 2, 1, 4],)
Expected Output: [3, 1, 2, 4]
Explanation: Keep each number the first time it appears.
Input: ([5, 5, 5, 5],)
Expected Output: [5]
Explanation: All later copies of 5 are duplicates.
Input: ([-1, 0, -1, 2, 0, 3],)
Expected Output: [-1, 0, 2, 3]
Explanation: Negative numbers and zero are handled the same way as positive numbers.
Input: ([],)
Expected Output: []
Explanation: An empty array has no duplicates to remove.
Input: ([7],)
Expected Output: [7]
Explanation: A single-element array is already deduplicated.
Hints
- Use a hash-based structure to remember which values have already appeared.
- Build a separate result list and append a value only the first time you see it.
Part 2: Deduplicate Using an Object-Style Lookup Table
Constraints
- 0 <= len(arr) <= 200000
- -10^9 <= arr[i] <= 10^9
- Do not mutate arr
- Do not use a set in your implementation
- Target time complexity: O(n)
Examples
Input: ([4, 2, 4, 4, 3, 2, 1],)
Expected Output: [4, 2, 3, 1]
Explanation: Only the first occurrence of each value is kept.
Input: ([0, -1, 0, -1, 2],)
Expected Output: [0, -1, 2]
Explanation: The lookup table should work for zero and negative numbers too.
Input: ([],)
Expected Output: []
Explanation: The empty array stays empty.
Input: ([9],)
Expected Output: [9]
Explanation: A single value is already unique.
Input: ([1, 2, 3, 4],)
Expected Output: [1, 2, 3, 4]
Explanation: If every value is unique, the output matches the input order.
Hints
- A dictionary can act like a lookup table even if you only care whether a key exists.
- When a value is not already in the lookup table, mark it as seen and append it to the answer.
Part 3: Detect Memory Risk for Object-Based Deduplication
Constraints
- 0 <= len(arr) <= 200000
- 0 <= memory_budget <= 10^12
- Each arr[i] is an ASCII string with length between 0 and 1000
- The estimated cost of one distinct string s is 16 + len(s)
- Target time complexity: O(n + total_characters)
Examples
Input: (['cat', 'dog', 'cat'], 40)
Expected Output: False
Explanation: Distinct strings are 'cat' and 'dog'. Estimated memory is 19 + 19 = 38, which does not exceed 40.
Input: (['apple', 'banana', 'apple', 'pear'], 45)
Expected Output: True
Explanation: Distinct strings are 'apple', 'banana', and 'pear'. Estimated memory is 21 + 22 + 20 = 63, which exceeds 45.
Input: ([], 0)
Expected Output: False
Explanation: No strings means no extra lookup-table memory.
Input: (['a'], 17)
Expected Output: False
Explanation: One distinct string costs exactly 16 + 1 = 17 bytes, which does not exceed the budget.
Input: (['a', 'bb'], 35)
Expected Output: False
Explanation: Estimated memory is 17 + 18 = 35, which matches the budget but does not exceed it.
Hints
- You only pay the memory cost once per distinct string, not once per occurrence.
- Keep a running total and return early if it already exceeds the budget.