How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at LinkedIn.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at LinkedIn during technical interviews.

Sample index from weighted probability distribution

Quick Overview

This question evaluates understanding of sampling from discrete probability distributions, randomized algorithm design, handling unnormalized weights and expected-value analysis for rejection-style approaches, along with trade-offs for scalable implementations.

Sample index from weighted probability distribution

Company: LinkedIn

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Given an array `weights[0..M-1]` representing a discrete distribution over `M` outcomes, implement a function `sampleIndex(weights)` that returns an index `i` with probability proportional to `weights[i]`. Assume all `weights[i] >= 0` and at least one weight is positive. Follow-ups (handle in the same discussion/solution): 1) What if the values in `weights` do **not** sum to `1` (they are unnormalized)? Provide at least two ways to handle this. 2) If you use a rejection-sampling-style approach when the sum is not `1`, what is the **expected number of trials** as a function of the total sum? 3) If `M` is very large and the probability mass is highly concentrated on a small number of indices, what engineering/algorithmic optimizations would you consider? Discuss trade-offs (time, memory, preprocessing cost, update cost).

Quick Answer: This question evaluates understanding of sampling from discrete probability distributions, randomized algorithm design, handling unnormalized weights and expected-value analysis for rejection-style approaches, along with trade-offs for scalable implementations.

Part 1: Deterministic weighted sample by ticket threshold

You are given a list of nonnegative integer `weights`. Imagine outcome `i` owns exactly `weights[i]` consecutive tickets in the range `[0, sum(weights) - 1]`. Given an integer ticket `t`, return the index that owns ticket `t`. If `t` were chosen uniformly at random, this would sample index `i` with probability proportional to `weights[i]`.

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
sum(weights) > 0
0 <= t < sum(weights)

Examples

Input: ([1, 3, 2], 0)

Expected Output: 0

Explanation: The ticket ranges are: index 0 -> [0], index 1 -> [1, 2, 3], index 2 -> [4, 5]. Ticket 0 belongs to index 0.

Input: ([1, 3, 2], 3)

Expected Output: 1

Explanation: Index 1 owns tickets 1 through 3, so ticket 3 belongs to index 1.

Input: ([1, 3, 2], 4)

Expected Output: 2

Explanation: Index 2 owns tickets 4 and 5, so ticket 4 belongs to index 2.

Input: ([0, 2, 0, 3], 2)

Expected Output: 3

Explanation: Index 0 and 2 own no tickets. Index 1 owns tickets 0 and 1, and index 3 owns tickets 2, 3, and 4. So ticket 2 belongs to index 3.

Input: ([5], 4)

Expected Output: 0

Explanation: There is only one index, and it owns tickets 0 through 4.

Input: ([], 0)

Expected Output: -1

Explanation: There are no weights and no valid tickets, so return -1.

Input: ([0, 0, 0], 0)

Expected Output: -1

Explanation: The total number of tickets is 0, so no ticket is valid.

Input: ([2, 1], 3)

Expected Output: -1

Explanation: The valid ticket range is 0 through 2. Ticket 3 is out of range.

Hints

Build prefix sums so each index corresponds to an interval of ticket values.
You need the first prefix sum that is strictly greater than `t`.

Part 2: Two equivalent ways to sample from unnormalized weights

The values in `weights` are nonnegative integers, but they do not necessarily sum to 1. For each rational query `u = numerator / denominator` in `[0, 1)`, compute the sampled index in two ways: (A) normalize the weights into probabilities and sample using cumulative probabilities, and (B) leave the weights unnormalized, scale the query by the total sum, and sample using cumulative weights. Return both result lists. The two methods should always agree.

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
sum(weights) > 0
1 <= len(queries) <= 200000
For each query `(a, b)`: 0 <= a < b <= 10^9

Examples

Input: ([2, 2, 2], [(0, 1), (1, 4), (1, 3), (1, 2), (2, 3), (5, 6)])

Expected Output: ([0, 0, 1, 1, 2, 2], [0, 0, 1, 1, 2, 2])

Explanation: The total weight is 6, so the intervals are [0, 1/3), [1/3, 2/3), and [2/3, 1). Boundary values like 1/3 and 2/3 move to the next index because the cumulative value must be strictly greater than the query.

Input: ([7], [(0, 1), (1, 2), (999, 1000)])

Expected Output: ([0, 0, 0], [0, 0, 0])

Explanation: There is only one index, so every valid query samples index 0.

Input: ([0, 5, 0, 5], [(0, 1), (49, 100), (1, 2), (99, 100)])

Expected Output: ([1, 1, 3, 3], [1, 1, 3, 3])

Explanation: Zero weights create empty intervals. Query 1/2 lands exactly on the boundary after the second positive block, so the answer becomes index 3.

Input: ([1, 3, 2, 4], [(0, 1), (1, 10), (39, 100), (3, 5), (9, 10)])

Expected Output: ([0, 1, 1, 3, 3], [0, 1, 1, 3, 3])

Explanation: The prefix sums are [1, 4, 6, 10]. For example, 1/10 corresponds to threshold 1, so the first prefix strictly greater than 1 is 4 at index 1.

Input: ([3, 1], [])

Expected Output: ([], [])

Explanation: With no queries, both result lists are empty.

Hints

Method A compares `u` against cumulative probabilities `prefix[i] / total`.
Method B compares `u * total` against cumulative raw weights, so you do not need to normalize explicitly.

Part 3: Expected number of rejection-sampling trials

Suppose outcome `i` has probability `weights[i] / scale`, and the total probability mass `S = sum(weights) / scale` satisfies `0 < S <= 1`. In one trial, you either output some index with total probability `S`, or reject with probability `1 - S` and try again. Compute the expected number of trials until the first accepted output, and return it as a reduced fraction `(numerator, denominator)`.

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
1 <= scale <= 10^18
0 < sum(weights) <= scale

Examples

Input: ([1, 2], 10)

Expected Output: (10, 3)

Explanation: The total acceptance probability is (1 + 2) / 10 = 3/10. The expected number of trials is 1 / (3/10) = 10/3.

Input: ([2, 3, 5], 10)

Expected Output: (1, 1)

Explanation: The total acceptance probability is 10/10 = 1, so the first trial always succeeds. The expectation is 1.

Input: ([0, 4, 0], 8)

Expected Output: (2, 1)

Explanation: The total acceptance probability is 4/8 = 1/2. The expected number of trials is 1 / (1/2) = 2.

Input: ([6], 9)

Expected Output: (3, 2)

Explanation: With a single outcome, the acceptance probability is 6/9 = 2/3. The expected number of trials is 1 / (2/3) = 3/2.

Input: ([7, 7], 28)

Expected Output: (2, 1)

Explanation: The total acceptance probability is 14/28 = 1/2. The expected number of trials is 2.

Hints

A single trial succeeds with probability `S = sum(weights) / scale`.
The number of trials until first success follows a geometric distribution with expectation `1 / S`.

Part 4: Compressed sampler for large concentrated distributions

When `M` is very large but only a small support carries positive probability mass, a practical optimization is to compress the distribution to just the positive-weight indices. Given `weights` and many ticket queries `t`, preprocess the positive support and answer each query using only that compressed representation. Return both the support indices and the sampled original indices. This optimization is especially useful for many queries on a static distribution; the trade-off is an `O(M)` preprocessing pass and rebuild cost if weights change frequently.

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
sum(weights) > 0
1 <= len(queries) <= 200000
For every query `t`: 0 <= t < sum(weights)
The distribution is static across all queries

Examples

Input: ([0, 0, 3, 0, 2], [0, 2, 3, 4])

Expected Output: ([2, 4], [2, 2, 4, 4])

Explanation: Only indices 2 and 4 have positive weight, so the compressed support is [2, 4]. Their cumulative weights are [3, 5]. Tickets 0 and 2 fall in index 2's range, while 3 and 4 fall in index 4's range.

Input: ([7, 0, 0, 0], [0])

Expected Output: ([0], [0])

Explanation: Index 0 is the only positive-weight index, so every valid ticket maps to 0.

Input: ([], [])

Expected Output: ([], [])

Explanation: With no weights and no queries, both the support and sampled outputs are empty.

Input: ([1, 0, 1, 0, 1], [0, 1, 2])

Expected Output: ([0, 2, 4], [0, 2, 4])

Explanation: Each positive index has weight 1, so the three ticket positions map directly to indices 0, 2, and 4.

Input: ([0, 5, 0, 2], [4, 5, 6])

Expected Output: ([1, 3], [1, 3, 3])

Explanation: Compressed support is [1, 3] with cumulative weights [5, 7]. Ticket 4 still belongs to index 1, while tickets 5 and 6 belong to index 3.

Hints

Indices with weight 0 can never be sampled, so they do not need to appear in the search structure.
Build prefix sums only over the positive-weight support, then binary-search each ticket there.

Quick Overview

Sample index from weighted probability distribution

Company: LinkedIn

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Part 1: Deterministic weighted sample by ticket threshold

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
sum(weights) > 0
0 <= t < sum(weights)

Examples

Input: ([1, 3, 2], 0)

Expected Output: 0

Explanation: The ticket ranges are: index 0 -> [0], index 1 -> [1, 2, 3], index 2 -> [4, 5]. Ticket 0 belongs to index 0.

Input: ([1, 3, 2], 3)

Expected Output: 1

Explanation: Index 1 owns tickets 1 through 3, so ticket 3 belongs to index 1.

Input: ([1, 3, 2], 4)

Expected Output: 2

Explanation: Index 2 owns tickets 4 and 5, so ticket 4 belongs to index 2.

Input: ([0, 2, 0, 3], 2)

Expected Output: 3

Explanation: Index 0 and 2 own no tickets. Index 1 owns tickets 0 and 1, and index 3 owns tickets 2, 3, and 4. So ticket 2 belongs to index 3.

Input: ([5], 4)

Expected Output: 0

Explanation: There is only one index, and it owns tickets 0 through 4.

Input: ([], 0)

Expected Output: -1

Explanation: There are no weights and no valid tickets, so return -1.

Input: ([0, 0, 0], 0)

Expected Output: -1

Explanation: The total number of tickets is 0, so no ticket is valid.

Input: ([2, 1], 3)

Expected Output: -1

Explanation: The valid ticket range is 0 through 2. Ticket 3 is out of range.

Hints

Build prefix sums so each index corresponds to an interval of ticket values.
You need the first prefix sum that is strictly greater than `t`.

Part 2: Two equivalent ways to sample from unnormalized weights

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
sum(weights) > 0
1 <= len(queries) <= 200000
For each query `(a, b)`: 0 <= a < b <= 10^9

Examples

Input: ([2, 2, 2], [(0, 1), (1, 4), (1, 3), (1, 2), (2, 3), (5, 6)])

Expected Output: ([0, 0, 1, 1, 2, 2], [0, 0, 1, 1, 2, 2])

Input: ([7], [(0, 1), (1, 2), (999, 1000)])

Expected Output: ([0, 0, 0], [0, 0, 0])

Explanation: There is only one index, so every valid query samples index 0.

Input: ([0, 5, 0, 5], [(0, 1), (49, 100), (1, 2), (99, 100)])

Expected Output: ([1, 1, 3, 3], [1, 1, 3, 3])

Explanation: Zero weights create empty intervals. Query 1/2 lands exactly on the boundary after the second positive block, so the answer becomes index 3.

Input: ([1, 3, 2, 4], [(0, 1), (1, 10), (39, 100), (3, 5), (9, 10)])

Expected Output: ([0, 1, 1, 3, 3], [0, 1, 1, 3, 3])

Explanation: The prefix sums are [1, 4, 6, 10]. For example, 1/10 corresponds to threshold 1, so the first prefix strictly greater than 1 is 4 at index 1.

Input: ([3, 1], [])

Expected Output: ([], [])

Explanation: With no queries, both result lists are empty.

Hints

Method A compares `u` against cumulative probabilities `prefix[i] / total`.
Method B compares `u * total` against cumulative raw weights, so you do not need to normalize explicitly.

Part 3: Expected number of rejection-sampling trials

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
1 <= scale <= 10^18
0 < sum(weights) <= scale

Examples

Input: ([1, 2], 10)

Expected Output: (10, 3)

Explanation: The total acceptance probability is (1 + 2) / 10 = 3/10. The expected number of trials is 1 / (3/10) = 10/3.

Input: ([2, 3, 5], 10)

Expected Output: (1, 1)

Explanation: The total acceptance probability is 10/10 = 1, so the first trial always succeeds. The expectation is 1.

Input: ([0, 4, 0], 8)

Expected Output: (2, 1)

Explanation: The total acceptance probability is 4/8 = 1/2. The expected number of trials is 1 / (1/2) = 2.

Input: ([6], 9)

Expected Output: (3, 2)

Explanation: With a single outcome, the acceptance probability is 6/9 = 2/3. The expected number of trials is 1 / (2/3) = 3/2.

Input: ([7, 7], 28)

Expected Output: (2, 1)

Explanation: The total acceptance probability is 14/28 = 1/2. The expected number of trials is 2.

Hints

A single trial succeeds with probability `S = sum(weights) / scale`.
The number of trials until first success follows a geometric distribution with expectation `1 / S`.

Part 4: Compressed sampler for large concentrated distributions

Constraints

1 <= len(weights) <= 200000
0 <= weights[i] <= 10^9
sum(weights) > 0
1 <= len(queries) <= 200000
For every query `t`: 0 <= t < sum(weights)
The distribution is static across all queries

Examples

Input: ([0, 0, 3, 0, 2], [0, 2, 3, 4])

Expected Output: ([2, 4], [2, 2, 4, 4])

Input: ([7, 0, 0, 0], [0])

Expected Output: ([0], [0])

Explanation: Index 0 is the only positive-weight index, so every valid ticket maps to 0.

Input: ([], [])

Expected Output: ([], [])

Explanation: With no weights and no queries, both the support and sampled outputs are empty.

Input: ([1, 0, 1, 0, 1], [0, 1, 2])

Expected Output: ([0, 2, 4], [0, 2, 4])

Explanation: Each positive index has weight 1, so the three ticket positions map directly to indices 0, 2, and 4.

Input: ([0, 5, 0, 2], [4, 5, 6])

Expected Output: ([1, 3], [1, 3, 3])

Explanation: Compressed support is [1, 3] with cumulative weights [5, 7]. Ticket 4 still belongs to index 1, while tickets 5 and 6 belong to index 3.

Hints

Indices with weight 0 can never be sampled, so they do not need to appear in the search structure.
Build prefix sums only over the positive-weight support, then binary-search each ticket there.