PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of streaming randomized algorithms and probabilistic reasoning—specifically reservoir sampling—along with analysis of time and space complexity and use of hash-based structures for duplicate detection; it is in the Coding & Algorithms domain and tests both conceptual probability invariants and practical implementation skills. Such problems are commonly asked to gauge a candidate's ability to design one-pass, linear-time solutions for unbounded data streams and to reason about algorithmic correctness and resource bounds in technical interviews.

  • Medium
  • Amazon
  • Coding & Algorithms
  • Data Scientist

Implement Reservoir Sampling; Analyze Time and Space Complexity

Company: Amazon

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

##### Scenario Process an unbounded stream of user IDs and maintain a uniform random sample of k users in memory. ##### Question Implement reservoir sampling for an incoming stream; analyze its time and space complexity. Explain why the algorithm guarantees each element is selected with probability k⁄n. Given an array of integers, return indices of duplicate values using only set() and enumerate(). Provide O(n) code. ##### Hints Focus on one-pass algorithms and probability invariants.

Quick Answer: This question evaluates understanding of streaming randomized algorithms and probabilistic reasoning—specifically reservoir sampling—along with analysis of time and space complexity and use of hash-based structures for duplicate detection; it is in the Coding & Algorithms domain and tests both conceptual probability invariants and practical implementation skills. Such problems are commonly asked to gauge a candidate's ability to design one-pass, linear-time solutions for unbounded data streams and to reason about algorithmic correctness and resource bounds in technical interviews.

Given a list of integers nums, return a list of indices i such that nums[i] is a value that occurs more than once in nums. Return indices in increasing order. Implement an O(n) solution using only set() and enumerate() (no dictionaries, counters, or sorting).

Constraints

  • 0 <= len(nums) <= 200000
  • nums[i] fits in 32-bit signed integer
  • Time complexity must be O(n)
  • Use only set() and enumerate(); do not use dict, Counter, or sorting
  • Return indices in increasing order

Solution

from typing import List

def duplicate_indices(nums: list[int]) -> list[int]:
    seen = set()
    dup_vals = set()
    for x in nums:
        if x in seen:
            dup_vals.add(x)
        else:
            seen.add(x)
    return [i for i, x in enumerate(nums) if x in dup_vals]
Explanation
Traverse nums once to identify which values occur more than once using two sets: 'seen' for values observed and 'dup_vals' for values seen again. Then, traverse nums with enumerate and collect indices whose values are in 'dup_vals'. This preserves increasing index order without sorting and uses only sets and enumerate.

Time complexity: O(n). Space complexity: O(u) additional space for sets, where u is the number of unique values; plus O(k) for the output indices where k is the number of indices returned.

Hints

  1. First pass: use a set to track seen values and collect which values are duplicates.
  2. Second pass: use enumerate to collect indices whose values are in the duplicates set.
  3. No need to sort; enumerating maintains increasing index order.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Datacenter Router Commands - Amazon (hard)
  • Replace Delimited Tokens in a String - Amazon (medium)
  • Minimize Circular Redistribution Cost - Amazon (medium)
  • Find the Most Common Visit Pattern - Amazon (hard)
  • Maximize Value Under a Budget - Amazon (medium)