PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Statistics & Math/Google

Generate values by weighted probabilities

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of probabilistic sampling, numerical stability, input validation, and algorithmic trade-offs for streaming weighted random draws, including generator determinism via seeding and methods such as CDF/binary-search versus alias for performance.

  • medium
  • Google
  • Statistics & Math
  • Machine Learning Engineer

Generate values by weighted probabilities

Company: Google

Role: Machine Learning Engineer

Category: Statistics & Math

Difficulty: medium

Interview Round: Onsite

Given a list of distinct integers and a matching list of probabilities (weights that sum to 1 within a small tolerance), implement a Python generator that yields values according to the specified probability distribution. The generator should support streaming (potentially infinite) draws, validate inputs (length match, nonnegative weights, handling of zeros), and address floating‑point rounding. Implement an efficient sampling method (e.g., prefix sums with binary search or the alias method) and explain the time/space trade‑offs. Provide unit tests that include deterministic checks via random seeding and statistical verification of the output distribution over many samples.

Quick Answer: This question evaluates understanding of probabilistic sampling, numerical stability, input validation, and algorithmic trade-offs for streaming weighted random draws, including generator determinism via seeding and methods such as CDF/binary-search versus alias for performance.

Related Interview Questions

  • Estimate weather’s effect on mental health - Google (easy)
  • Explain Bootstrap and Statistical Inference - Google (hard)
  • Explain Bootstrap and Prove Uniformity - Google (hard)
  • Can bootstrap help reduce variance - Google (medium)
  • Compute precision under noisy annotators - Google (medium)
Google logo
Google
Sep 6, 2025, 12:00 AM
Machine Learning Engineer
Onsite
Statistics & Math
4
0

Weighted Random Sampling Generator (Streaming)

You are given:

  • A list of distinct integers values.
  • A matching list of nonnegative probabilities (weights) that should sum to 1 within a small tolerance.

Implement a Python solution that supports streaming (potentially infinite) weighted draws, validates inputs, and is numerically robust.

Requirements

  1. Input validation
    • values and weights must be the same nonzero length.
    • values must contain distinct integers.
    • weights must be nonnegative; zero-weight items are allowed and must never be sampled.
    • The sum of weights should be 1 within a small tolerance; handle floating-point rounding and renormalize when needed.
  2. Generator interface
    • Provide a generator that yields an infinite stream of draws following the specified distribution.
    • Allow deterministic behavior via a random seed.
  3. Efficient sampling methods
    • Implement at least one efficient method and optionally both:
      • Prefix sums with binary search (CDF + bisect): O(n) preprocess, O(log n) per sample, O(n) space.
      • Alias method (Vose/Walker): O(n) preprocess, O(1) per sample, O(n) space.
    • Explain time/space trade-offs and when to use each.
  4. Numerical robustness
    • Clamp tiny negative weights caused by rounding to zero.
    • Renormalize weights if their sum differs from 1 by more than a tolerance.
    • Ensure the final CDF ends at exactly 1.0 to avoid edge-case misses.
  5. Tests
    • Unit tests with:
      • Deterministic checks via random seeding (same seed → same sequence; different seeds → different sequences with high probability).
      • Statistical verification over many samples (e.g., 50k–200k) that empirical frequencies are close to target probabilities.
      • Edge-case validation (zeros, invalid inputs).

Provide clean, well-documented code and explain the design choices and trade-offs.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Google•More Machine Learning Engineer•Google Machine Learning Engineer•Google Statistics & Math•Machine Learning Engineer Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.