Find duplicates in a list efficiently

Q: Find duplicates in a list efficiently

This is a Coding & Algorithms interview question from TCS for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Given a Python list that can contain integers (possibly negative) and may be very large, write code and explain approaches to find all values that appear more than once, along with their occurrence counts and the index of their first appearance, while preserving the original order of first occurrence. For example, input: [3, 1, 2, 3, -1, 2, 3, 4, 1] → output should enumerate 3 (count 3, first idx 0), 2 (count 2, first idx 2), 1 (count 2, first idx 1) in that order. Then answer:

Provide an O(n) time, O(k) extra space solution (k = number of distinct values) and analyze time/space complexities. Explain how you maintain stable output order.
If the integers are guaranteed in the range 0..n-1 and in-place modification is allowed, design an O(n) time, O(1) extra space method (e.g., index marking or modular counting). Explain how to adapt it when negatives may appear.
For streaming data that cannot fit in memory (n up to 1e8), outline a solution using fixed memory (e.g., hashing to buckets on disk, Count-Min Sketch + exact verification). Analyze error bounds if you use probabilistic structures.
Discuss trade-offs among hash-based counting, sorting-based methods (O(n log n), O(1) extra space if in-place), and bitset/bitmap approaches for large integer domains. Include worst-case behaviors and cache implications.
Provide thorough tests covering empty list, all-unique, all-duplicates, very large inputs, and mixtures of positive/negative values.

Find duplicates in a list efficiently

Comments (0)