How to Design a Proportional Randomized Sampler?
Company: Pinterest
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Scenario
Randomized promotion engine must pick an item proportional to its score, but scores have no upper bound.
##### Question
Design a sampler pick() that returns an item with probability proportional to its (possibly very large) weight. Discuss memory and precision trade-offs when weights exceed 32-bit limits.
##### Hints
Prefix-sum array + binary search or alias table; rescale or use long/double for huge weights.
Quick Answer: This question evaluates proficiency in randomized algorithms and probabilistic sampling, with emphasis on numerical precision, large-weight handling, and memory-versus-accuracy trade-offs. Commonly asked in Coding & Algorithms interviews for data-science roles, it assesses practical application ability grounded in conceptual understanding of numerical stability and relevant data-structure choices.
You are given an array weights of n non-negative integers representing item weights, and an integer r such that 0 <= r < sum(weights). Implement weighted_pick(weights, r) that returns the smallest index i for which r < weights[0] + weights[1] + ... + weights[i]. This deterministically maps a uniform integer draw r in [0, sum(weights) - 1] to the item chosen proportionally to its weight. At least one weight must be positive. Use integer arithmetic to handle weights larger than 32-bit.
Constraints
- 1 <= n <= 2e5
- weights[i] are integers with 0 <= weights[i]
- At least one weights[i] > 0
- 0 <= r < sum(weights)
- sum(weights) <= 1e20 (weights may exceed 32-bit limits)
- Use integer arithmetic; avoid floating point precision loss
Solution
from typing import List
import bisect
def weighted_pick(weights: List[int], r: int) -> int:
# Build prefix sums using integer arithmetic
prefix = []
total = 0
for w in weights:
if w < 0:
raise ValueError("weights must be non-negative")
total += w
prefix.append(total)
if total <= 0:
raise ValueError("sum of weights must be positive")
if r < 0 or r >= total:
raise ValueError("r must satisfy 0 <= r < sum(weights)")
# Find smallest i with prefix[i] > r
return bisect.bisect_right(prefix, r)
Explanation
Compute the cumulative sums prefix[i] = sum(weights[0..i]). Given r in [0, total-1], the desired index i is the smallest index with prefix[i] > r. Using binary search (bisect_right) over the monotonic prefix array yields O(log n) lookup. This approach handles arbitrarily large weights without precision loss by staying in integer arithmetic. Zero-weight entries create repeated prefix values; bisect_right skips them automatically. For repeated queries, precompute the prefix once; alternatively, the alias method offers O(1) sampling at the cost of O(n) preprocessing and O(n) extra memory.