Implement np.random.choice
Company: Pinterest
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of random sampling algorithms, probability distributions (uniform and weighted), handling of weights and edge-case validation, and engineering trade-offs in data-structure-aware implementation for both integer ranges and explicit collections.
Part 1: Deterministic Choice With Replacement
Constraints
- If `a` is an integer, `1 <= a <= 10^5`.
- If `a` is a list/tuple, `1 <= len(a) <= 10^5`.
- `size` is `None` or `0 <= size <= 10^5`.
- `random_values` contains at least 1 value when `size is None`, otherwise at least `size` values.
- Each random value satisfies `0 <= r < 1`.
- If `p` is provided, it has the same length as the population, all weights are non-negative, and the total weight is positive.
Examples
Input: (5, 4, None, [0.02, 0.4, 0.99, 0.6])
Expected Output: [0, 2, 4, 3]
Explanation: Uniform sampling from `[0,1,2,3,4]` with indices `0,2,4,3`.
Input: (['a', 'b', 'c'], 5, [0.1, 0.3, 0.6], [0.05, 0.1, 0.39, 0.4, 0.95])
Expected Output: ['a', 'b', 'b', 'c', 'c']
Explanation: Cumulative probabilities are `[0.1, 0.4, 1.0]`; note that boundary values move to the next bucket because we choose the first cumulative value strictly greater than `r`.
Input: (['x'], None, None, [0.73])
Expected Output: 'x'
Explanation: Edge case: single-element population and `size=None`.
Input: (4, None, [0.25, 0.25, 0.25, 0.25], [0.74])
Expected Output: 2
Explanation: Weighted but uniform probabilities; `0.74` falls into the third bucket.
Hints
- For uniform sampling with replacement, map `r` to an index with `int(r * n)`.
- For weighted sampling, build cumulative probabilities once, then use binary search for each draw.
Part 2: Deterministic Choice Without Replacement
Constraints
- If `a` is an integer, `1 <= a <= 2000`.
- If `a` is a list/tuple, `1 <= len(a) <= 2000`.
- `size` is `None` or `0 <= size <= n`, where `n` is the population size.
- Each random value satisfies `0 <= r < 1`.
- If `p` is provided, it has the same length as the population, all weights are non-negative, the total weight is positive, and there are enough positive-weight items to finish all draws.
Examples
Input: (5, 3, None, [0.1, 0.7, 0.2])
Expected Output: [0, 3, 1]
Explanation: Remaining items shrink after each draw: `[0,1,2,3,4] -> [1,2,3,4] -> [1,2,4]`.
Input: (['a', 'b', 'c'], 3, [0.2, 0.5, 0.3], [0.6, 0.1, 0.8])
Expected Output: ['b', 'a', 'c']
Explanation: Draw `b` first, then renormalize weights of `a` and `c`, then continue.
Input: (4, 2, [0.1, 0.2, 0.3, 0.4], [0.95, 0.4])
Expected Output: [3, 1]
Explanation: First draw picks item `3`; remaining weights become `[0.1, 0.2, 0.3]` and are renormalized.
Input: (['x'], None, None, [0.99])
Expected Output: 'x'
Explanation: Edge case: one element and a single draw.
Hints
- For the uniform case, think of choosing an index from the remaining array, then removing that item.
- For the weighted case, after removing an item you must also remove its weight and rebuild the cumulative distribution for the next draw.
Part 3: Validate and Analyze a Choice Call
Constraints
- `a` is either an integer or a list/tuple.
- If `a` is an integer, it must be positive; if it is a list/tuple, it must be non-empty.
- `size` must be `None` or a non-negative integer.
- If `replace` is `False`, then `k <= n` must hold.
- If `p` is provided, it must match the population length, contain only non-negative numbers, and have positive total sum.
- For weighted sampling without replacement, there must be at least `k` positive-probability items.
- Return normalized probabilities rounded to 12 decimal places.
Examples
Input: (5, None, True, None)
Expected Output: (5, 1, 'uniform_with', 'O(k)', None)
Explanation: Single uniform draw with replacement.
Input: (['a', 'b', 'c'], 2, True, [2, 1, 1])
Expected Output: (3, 2, 'weighted_with', 'O(n + k log n)', [0.5, 0.25, 0.25])
Explanation: The weights are normalized before analysis.
Input: ([10, 20], 3, False, None)
Expected Output: 'sample_too_large'
Explanation: Cannot draw 3 items without replacement from a population of 2.
Input: (['x', 'y'], 1, False, [0.3])
Expected Output: 'invalid_p_length'
Explanation: Probability list length must match the population size.
Input: (4, 2, False, [0, 0, 0, 0])
Expected Output: 'invalid_p_values'
Explanation: Probability weights must have positive total sum.
Hints
- First normalize `a` into a population size `n`, and convert `size=None` into an effective sample count `k=1`.
- Handle validation before deciding the mode and the associated complexity string.