How to Design a Proportional Randomized Sampler?

Q: How to Design a Proportional Randomized Sampler?

This question evaluates proficiency in randomized algorithms and probabilistic sampling, with emphasis on numerical precision, large-weight handling, and memory-versus-accuracy trade-offs. Commonly asked in Coding & Algorithms interviews for data-science roles, it assesses practical application ability grounded in conceptual understanding of numerical stability and relevant data-structure choices.

Q: How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

Q: What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Pinterest.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Pinterest during technical interviews.

Question

##### Scenario

Randomized promotion engine must pick an item proportional to its score, but scores have no upper bound.

##### Question

Design a sampler pick() that returns an item with probability proportional to its (possibly very large) weight.  Discuss memory and precision trade-offs when weights exceed 32-bit limits.

##### Hints

Prefix-sum array + binary search or alias table; rescale or use long/double for huge weights.

PracHub · Accepted Answer

from typing import List import bisect def weighted_pick(weights: List[int], r: int) -> int: # Build prefix sums using integer arithmetic prefix = [] total = 0 for w in weights: if w < 0: raise ValueError("weights must be non-negative") total += w prefix.append(total) if total <= 0: raise ValueError("sum of weights must be positive") if r < 0 or r >= total: raise ValueError("r must satisfy 0 <= r < sum(weights)") # Find smallest i with prefix[i] > r return bisect.bisect_right(prefix, r) Compute the cumulative sums prefix[i] = sum(weights[0..i]). Given r in [0, total-1], the desired index i is the smallest index with prefix[i] > r. Using binary search (bisect_right) over the monotonic prefix array yields O(log n) lookup. This approach handles arbitrarily large weights without precision loss by staying in integer arithmetic. Zero-weight entries create repeated prefix values; bisect_right skips them automatically. For repeated queries, precompute the prefix once; alternatively, the alias method offers O(1) sampling at the cost of O(n) preprocessing and O(n) extra memory.

Quick Overview

How to Design a Proportional Randomized Sampler?

Constraints

Solution

Explanation

Hints