PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of statistical resampling (jackknife), probabilistic sampling algorithms including with- and without-replacement selection, and proficiency with Python data structures and numerical libraries for implementing estimators.

  • medium
  • Pinterest
  • Coding & Algorithms
  • Data Scientist

Implement Jackknife and Random Choice

Company: Pinterest

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Write Python code for two implementation tasks without using `np.random.choice` directly: 1. **Jackknife-style resampling.** Given a one-dimensional collection of observations `x` of length `n`, generate all `n` leave-one-out samples. For a supplied statistic function `f(sample)`, compute the statistic on each leave-one-out sample and return the jackknife estimate of bias and variance. Your implementation should work when the input is either a Python list or a pandas DataFrame with one row per observation. 2. **Random choice sampler.** Given an array `values`, a probability vector `p` such that `sum(p) = 1`, and an integer `k`, implement a simplified version of `np.random.choice` that returns `k` samples according to `p`. Discuss how your approach differs for sampling with replacement versus without replacement, and state the expected time complexity.

Quick Answer: This question evaluates understanding of statistical resampling (jackknife), probabilistic sampling algorithms including with- and without-replacement selection, and proficiency with Python data structures and numerical libraries for implementing estimators.

Part 1: Jackknife Bias and Variance

Given a one-dimensional sample x and a statistic selector statistic, generate every leave-one-out sample by removing exactly one observation at a time. Compute the statistic on each leave-one-out sample in original order. Let theta_hat be the statistic on the full sample, and let theta_dot be the average of the leave-one-out statistics. Return [[loo_stats], [bias, variance]], where bias = (n - 1) * (theta_dot - theta_hat) and variance = ((n - 1) / n) * sum((theta_i - theta_dot)^2). If x has fewer than 2 observations, return [[], [0.0, 0.0]]. Round every returned number to 6 decimal places. For testing, statistic will be one of "mean", "sum", "min", "max", or "median". A Python implementation may also optionally accept a pandas DataFrame with exactly one numeric column and one row per observation.

Constraints

  • 0 <= len(x) <= 1000
  • Each element of x is numeric
  • statistic is one of: "mean", "sum", "min", "max", "median"
  • If a pandas DataFrame is provided, it has exactly one numeric column

Examples

Input: ([1, 2, 3, 4], "mean")

Expected Output: [[3.0, 2.666667, 2.333333, 2.0], [0.0, 0.416667]]

Explanation: Leave-one-out means are computed in index order. The mean of those values matches the full-sample mean, so the bias is 0.

Input: ([2, 2, 2], "sum")

Expected Output: [[4.0, 4.0, 4.0], [-4.0, 0.0]]

Explanation: Each leave-one-out sample sums to 4, while the full sample sums to 6.

Input: ([5, 7], "mean")

Expected Output: [[7.0, 5.0], [0.0, 1.0]]

Explanation: Removing 5 leaves [7], and removing 7 leaves [5].

Input: ([5], "mean")

Expected Output: [[], [0.0, 0.0]]

Explanation: With fewer than 2 observations, no leave-one-out statistic is defined for this problem.

Input: ([], "sum")

Expected Output: [[], [0.0, 0.0]]

Explanation: Empty input is handled as a boundary case.

Hints

  1. Compute the statistic on the full sample once, then compute it again after removing each index.
  2. The jackknife formulas use the average of the leave-one-out estimates, not the original estimate directly.

Part 2: Weighted Random Choice Sampler

Implement a simplified version of weighted random choice without using numpy. To make the problem deterministic, you are given random_values, a list of numbers in [0, 1), and you must use them as the successive random draws. For each draw, select the first index whose cumulative probability is strictly greater than the current random value. If replace is true, probabilities stay the same for all draws. If replace is false, remove the chosen item, remove its probability, and renormalize the remaining probabilities before the next draw. Return the sampled values in draw order. Return [] for invalid input or if replace is false and k is larger than the number of available values.

Constraints

  • 0 <= len(values) <= 10000
  • len(values) == len(p)
  • 0 <= k <= 10000
  • All probabilities in p are non-negative
  • sum(p) > 0
  • len(random_values) >= k
  • If replace is false and k > len(values), return []

Examples

Input: ([10, 20, 30], [0.2, 0.5, 0.3], 5, True, [0.05, 0.2, 0.69, 0.7, 0.99])

Expected Output: [10, 20, 20, 30, 30]

Explanation: Using cumulative probabilities [0.2, 0.7, 1.0], each random number maps to one value.

Input: ([1, 2, 3, 4], [0.1, 0.2, 0.3, 0.4], 3, False, [0.05, 0.5, 0.75])

Expected Output: [1, 3, 4]

Explanation: After each selection, the remaining probabilities are renormalized before the next draw.

Input: ([1, 2, 3], [0.3, 0.3, 0.4], 0, True, [])

Expected Output: []

Explanation: Requesting zero samples should return an empty list.

Input: ([1, 2], [0.5, 0.5], 3, False, [0.1, 0.2, 0.3])

Expected Output: []

Explanation: Sampling without replacement cannot draw more items than exist.

Input: ([7, 8, 9], [0.0, 0.4, 0.6], 3, True, [0.0, 0.39, 0.99])

Expected Output: [8, 8, 9]

Explanation: The value with probability 0 is never selected.

Hints

  1. For sampling with replacement, build one prefix-sum array and binary-search each random number.
  2. For sampling without replacement, remove the chosen item and recompute selection using the remaining weights.
Last updated: May 19, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Hierarchical Access Control for an Advertising Platform - Pinterest (medium)
  • First Word Matching Each Prefix Query - Pinterest (medium)
  • Maximize Boxes Stored Through One Entrance - Pinterest (medium)
  • Solve Multiple Coding Interview Problems - Pinterest (medium)
  • Implement a Sparse Matrix Class - Pinterest (medium)