PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of weighted random sampling, probability distributions, input validation, and algorithmic time and space complexity in the context of implementing a sampling routine.

  • medium
  • Pinterest
  • Coding & Algorithms
  • Data Scientist

Implement weighted random choice

Company: Pinterest

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Implement a simplified version of `np.random.choice` without calling the built-in function. You are given an array `items` of length `n` and an optional probability array `p` of length `n`, where `p[i]` is the probability of returning `items[i]` and the probabilities sum to 1 within numerical tolerance. Requirements: - If `p` is omitted, sample uniformly. - Support returning a single sample. - Extend the function to support `size = k`, returning `k` independent samples with replacement. - Explain how you would validate invalid inputs such as negative probabilities, length mismatches, or probabilities that do not sum correctly. - Discuss the time and space complexity of your approach. Bonus: describe an efficient approach for repeated sampling or for sampling without replacement.

Quick Answer: This question evaluates understanding of weighted random sampling, probability distributions, input validation, and algorithmic time and space complexity in the context of implementing a sampling routine.

Part 1: Deterministic Weighted Random Choice for One Sample

You are given a list `items`, an optional probability list `p`, and a floating-point value `r` in the range `[0, 1)`. Simulate one weighted random choice without using any built-in random-choice function. If `p` is `None`, choose uniformly from `items`. If `p` is provided, `p[i]` is the probability of returning `items[i]`. Use cumulative probabilities and return the first item whose cumulative probability is strictly greater than `r`. If the input is invalid, return the string `'INVALID'`. Invalid input includes: empty `items`, `r` outside `[0, 1)`, `p` length mismatch, negative probabilities, or probabilities whose sum differs from `1` by more than `1e-9`.

Constraints

  • `1 <= len(items) <= 10^5` for valid inputs
  • If `p` is not `None`, then `len(p) == len(items)`
  • Each probability must be a non-negative number
  • For valid weighted input, `abs(sum(p) - 1.0) <= 1e-9`
  • `0 <= r < 1`

Examples

Input: (['red', 'green', 'blue'], [0.1, 0.3, 0.6], 0.35)

Expected Output: 'green'

Explanation: Cumulative probabilities are 0.1, 0.4, 1.0. Since 0.35 < 0.4, the answer is 'green'.

Input: ([10, 20, 30, 40], None, 0.74)

Expected Output: 30

Explanation: Uniform buckets are [0,0.25), [0.25,0.5), [0.5,0.75), [0.75,1). 0.74 falls in the third bucket.

Input: ([7], None, 0.999)

Expected Output: 7

Explanation: With one item, that item is always returned.

Input: ([], None, 0.2)

Expected Output: 'INVALID'

Explanation: An empty item list is invalid.

Input: (['a', 'b'], [0.6, -0.4], 0.2)

Expected Output: 'INVALID'

Explanation: Negative probabilities are invalid.

Hints

  1. Think of `r` as landing inside one interval of a cumulative probability array.
  2. For uniform sampling, divide `[0, 1)` into `n` equal-sized buckets.

Part 2: Deterministic Weighted Random Choice for k Samples With Replacement

You are given a list `items`, an optional probability list `p`, an integer `size`, and a list `random_values` of length `size`. For each value `random_values[j]` in `[0, 1)`, independently simulate one weighted random choice and return all results in order. Sampling is with replacement, so the same item may appear multiple times. If `p` is `None`, sample uniformly. If the input is invalid, return the string `'INVALID'`. Invalid input includes: empty `items`, negative `size`, `random_values` length mismatch, any `random_values[j]` outside `[0,1)`, `p` length mismatch, negative probabilities, or probabilities whose sum differs from `1` by more than `1e-9`.

Constraints

  • `1 <= len(items) <= 10^5` for valid inputs
  • `0 <= size <= 10^5`
  • `len(random_values) == size`
  • Each `random_values[j]` must satisfy `0 <= random_values[j] < 1`
  • If `p` is not `None`, then `len(p) == len(items)`
  • Each probability must be a non-negative number
  • For valid weighted input, `abs(sum(p) - 1.0) <= 1e-9`

Examples

Input: (['a', 'b', 'c'], [0.2, 0.5, 0.3], 4, [0.05, 0.2, 0.69, 0.95])

Expected Output: ['a', 'b', 'b', 'c']

Explanation: Using cumulative probabilities [0.2, 0.7, 1.0], the four draws map to a, b, b, c.

Input: ([1, 2, 3], None, 5, [0.0, 0.34, 0.35, 0.99, 0.66])

Expected Output: [1, 2, 2, 3, 2]

Explanation: Uniform sampling over 3 items maps each value by bucket index `int(r * 3)`.

Input: (['x', 'y'], [0.5, 0.5], 0, [])

Expected Output: []

Explanation: Requesting zero samples should return an empty list.

Input: ([7], None, 3, [0.1, 0.5, 0.999])

Expected Output: [7, 7, 7]

Explanation: With one item, every sample returns that same item.

Input: (['a', 'b'], [0.5, 0.5], 3, [0.1, 0.2])

Expected Output: 'INVALID'

Explanation: The number of random values must exactly match `size`.

Hints

  1. Precompute cumulative probabilities once instead of rescanning `p` from scratch for every sample.
  2. After building prefix sums, binary search can locate the bucket for each random value.
Last updated: Apr 21, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Maximize Boxes Stored Through One Entrance - Pinterest (medium)
  • Solve Multiple Coding Interview Problems - Pinterest (medium)
  • Implement a Sparse Matrix Class - Pinterest (medium)
  • Assign Pins to Shortest Columns - Pinterest (medium)
  • Design Hierarchical Permission Checks - Pinterest (medium)