Implement weighted random choice
Company: Pinterest
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of weighted random sampling, probability distributions, input validation, and algorithmic time and space complexity in the context of implementing a sampling routine.
Part 1: Deterministic Weighted Random Choice for One Sample
Constraints
- `1 <= len(items) <= 10^5` for valid inputs
- If `p` is not `None`, then `len(p) == len(items)`
- Each probability must be a non-negative number
- For valid weighted input, `abs(sum(p) - 1.0) <= 1e-9`
- `0 <= r < 1`
Examples
Input: (['red', 'green', 'blue'], [0.1, 0.3, 0.6], 0.35)
Expected Output: 'green'
Explanation: Cumulative probabilities are 0.1, 0.4, 1.0. Since 0.35 < 0.4, the answer is 'green'.
Input: ([10, 20, 30, 40], None, 0.74)
Expected Output: 30
Explanation: Uniform buckets are [0,0.25), [0.25,0.5), [0.5,0.75), [0.75,1). 0.74 falls in the third bucket.
Input: ([7], None, 0.999)
Expected Output: 7
Explanation: With one item, that item is always returned.
Input: ([], None, 0.2)
Expected Output: 'INVALID'
Explanation: An empty item list is invalid.
Input: (['a', 'b'], [0.6, -0.4], 0.2)
Expected Output: 'INVALID'
Explanation: Negative probabilities are invalid.
Hints
- Think of `r` as landing inside one interval of a cumulative probability array.
- For uniform sampling, divide `[0, 1)` into `n` equal-sized buckets.
Part 2: Deterministic Weighted Random Choice for k Samples With Replacement
Constraints
- `1 <= len(items) <= 10^5` for valid inputs
- `0 <= size <= 10^5`
- `len(random_values) == size`
- Each `random_values[j]` must satisfy `0 <= random_values[j] < 1`
- If `p` is not `None`, then `len(p) == len(items)`
- Each probability must be a non-negative number
- For valid weighted input, `abs(sum(p) - 1.0) <= 1e-9`
Examples
Input: (['a', 'b', 'c'], [0.2, 0.5, 0.3], 4, [0.05, 0.2, 0.69, 0.95])
Expected Output: ['a', 'b', 'b', 'c']
Explanation: Using cumulative probabilities [0.2, 0.7, 1.0], the four draws map to a, b, b, c.
Input: ([1, 2, 3], None, 5, [0.0, 0.34, 0.35, 0.99, 0.66])
Expected Output: [1, 2, 2, 3, 2]
Explanation: Uniform sampling over 3 items maps each value by bucket index `int(r * 3)`.
Input: (['x', 'y'], [0.5, 0.5], 0, [])
Expected Output: []
Explanation: Requesting zero samples should return an empty list.
Input: ([7], None, 3, [0.1, 0.5, 0.999])
Expected Output: [7, 7, 7]
Explanation: With one item, every sample returns that same item.
Input: (['a', 'b'], [0.5, 0.5], 3, [0.1, 0.2])
Expected Output: 'INVALID'
Explanation: The number of random values must exactly match `size`.
Hints
- Precompute cumulative probabilities once instead of rescanning `p` from scratch for every sample.
- After building prefix sums, binary search can locate the bucket for each random value.