How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Pinterest.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Pinterest during technical interviews.

Implement weighted random choice | Pinterest Coding Question

Implement weighted random choice

Company: Pinterest

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Implement a simplified version of `np.random.choice` without calling the built-in function. You are given an array `items` of length `n` and an optional probability array `p` of length `n`, where `p[i]` is the probability of returning `items[i]` and the probabilities sum to 1 within numerical tolerance. Requirements: - If `p` is omitted, sample uniformly. - Support returning a single sample. - Extend the function to support `size = k`, returning `k` independent samples with replacement. - Explain how you would validate invalid inputs such as negative probabilities, length mismatches, or probabilities that do not sum correctly. - Discuss the time and space complexity of your approach. Bonus: describe an efficient approach for repeated sampling or for sampling without replacement.

Quick Answer: This question evaluates understanding of weighted random sampling, probability distributions, input validation, and algorithmic time and space complexity in the context of implementing a sampling routine.

Part 1: Deterministic Weighted Random Choice for One Sample

You are given a list `items`, an optional probability list `p`, and a floating-point value `r` in the range `[0, 1)`. Simulate one weighted random choice without using any built-in random-choice function. If `p` is `None`, choose uniformly from `items`. If `p` is provided, `p[i]` is the probability of returning `items[i]`. Use cumulative probabilities and return the first item whose cumulative probability is strictly greater than `r`. If the input is invalid, return the string `'INVALID'`. Invalid input includes: empty `items`, `r` outside `[0, 1)`, `p` length mismatch, negative probabilities, or probabilities whose sum differs from `1` by more than `1e-9`.

Constraints

`1 <= len(items) <= 10^5` for valid inputs
If `p` is not `None`, then `len(p) == len(items)`
Each probability must be a non-negative number
For valid weighted input, `abs(sum(p) - 1.0) <= 1e-9`
`0 <= r < 1`

Examples

Input: (['red', 'green', 'blue'], [0.1, 0.3, 0.6], 0.35)

Expected Output: 'green'

Explanation: Cumulative probabilities are 0.1, 0.4, 1.0. Since 0.35 < 0.4, the answer is 'green'.

Input: ([10, 20, 30, 40], None, 0.74)

Expected Output: 30

Explanation: Uniform buckets are [0,0.25), [0.25,0.5), [0.5,0.75), [0.75,1). 0.74 falls in the third bucket.

Input: ([7], None, 0.999)

Expected Output: 7

Explanation: With one item, that item is always returned.

Input: ([], None, 0.2)

Expected Output: 'INVALID'

Explanation: An empty item list is invalid.

Input: (['a', 'b'], [0.6, -0.4], 0.2)

Expected Output: 'INVALID'

Explanation: Negative probabilities are invalid.

Hints

Think of `r` as landing inside one interval of a cumulative probability array.
For uniform sampling, divide `[0, 1)` into `n` equal-sized buckets.

Part 2: Deterministic Weighted Random Choice for k Samples With Replacement

You are given a list `items`, an optional probability list `p`, an integer `size`, and a list `random_values` of length `size`. For each value `random_values[j]` in `[0, 1)`, independently simulate one weighted random choice and return all results in order. Sampling is with replacement, so the same item may appear multiple times. If `p` is `None`, sample uniformly. If the input is invalid, return the string `'INVALID'`. Invalid input includes: empty `items`, negative `size`, `random_values` length mismatch, any `random_values[j]` outside `[0,1)`, `p` length mismatch, negative probabilities, or probabilities whose sum differs from `1` by more than `1e-9`.

Constraints

`1 <= len(items) <= 10^5` for valid inputs
`0 <= size <= 10^5`
`len(random_values) == size`
Each `random_values[j]` must satisfy `0 <= random_values[j] < 1`
If `p` is not `None`, then `len(p) == len(items)`
Each probability must be a non-negative number
For valid weighted input, `abs(sum(p) - 1.0) <= 1e-9`

Examples

Input: (['a', 'b', 'c'], [0.2, 0.5, 0.3], 4, [0.05, 0.2, 0.69, 0.95])

Expected Output: ['a', 'b', 'b', 'c']

Explanation: Using cumulative probabilities [0.2, 0.7, 1.0], the four draws map to a, b, b, c.

Input: ([1, 2, 3], None, 5, [0.0, 0.34, 0.35, 0.99, 0.66])

Expected Output: [1, 2, 2, 3, 2]

Explanation: Uniform sampling over 3 items maps each value by bucket index `int(r * 3)`.

Input: (['x', 'y'], [0.5, 0.5], 0, [])

Expected Output: []

Explanation: Requesting zero samples should return an empty list.

Input: ([7], None, 3, [0.1, 0.5, 0.999])

Expected Output: [7, 7, 7]

Explanation: With one item, every sample returns that same item.

Input: (['a', 'b'], [0.5, 0.5], 3, [0.1, 0.2])

Expected Output: 'INVALID'

Explanation: The number of random values must exactly match `size`.

Hints

Precompute cumulative probabilities once instead of rescanning `p` from scratch for every sample.
After building prefix sums, binary search can locate the bucket for each random value.

Quick Overview

This question evaluates understanding of weighted random sampling, probability distributions, input validation, and algorithmic time and space complexity in the context of implementing a sampling routine.