How to Design a Proportional Randomized Sampler?
Company: Pinterest
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates proficiency in randomized algorithms and probabilistic sampling, with emphasis on numerical precision, large-weight handling, and memory-versus-accuracy trade-offs. Commonly asked in Coding & Algorithms interviews for data-science roles, it assesses practical application ability grounded in conceptual understanding of numerical stability and relevant data-structure choices.
Constraints
- 1 <= n <= 2e5
- weights[i] are integers with 0 <= weights[i]
- At least one weights[i] > 0
- 0 <= r < sum(weights)
- sum(weights) <= 1e20 (weights may exceed 32-bit limits)
- Use integer arithmetic; avoid floating point precision loss
Hints
- Build a prefix-sum array of weights and binary search for the first prefix strictly greater than r.
- Use 64-bit or arbitrary-precision integers; avoid floating point to prevent precision loss with huge weights.
- Zero-weight items are never selected; using bisect-right over prefix sums naturally skips them.
- For many repeated picks, precompute the prefix once or consider the alias method for O(1) sampling.