How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Pinterest.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Pinterest during technical interviews.

Implement Jackknife and Random Choice | Pinterest Coding Question

Quick Overview

This question evaluates understanding of statistical resampling (jackknife), probabilistic sampling algorithms including with- and without-replacement selection, and proficiency with Python data structures and numerical libraries for implementing estimators.

Implement Jackknife and Random Choice

Company: Pinterest

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Write Python code for two implementation tasks without using `np.random.choice` directly: 1. **Jackknife-style resampling.** Given a one-dimensional collection of observations `x` of length `n`, generate all `n` leave-one-out samples. For a supplied statistic function `f(sample)`, compute the statistic on each leave-one-out sample and return the jackknife estimate of bias and variance. Your implementation should work when the input is either a Python list or a pandas DataFrame with one row per observation. 2. **Random choice sampler.** Given an array `values`, a probability vector `p` such that `sum(p) = 1`, and an integer `k`, implement a simplified version of `np.random.choice` that returns `k` samples according to `p`. Discuss how your approach differs for sampling with replacement versus without replacement, and state the expected time complexity.

Quick Answer: This question evaluates understanding of statistical resampling (jackknife), probabilistic sampling algorithms including with- and without-replacement selection, and proficiency with Python data structures and numerical libraries for implementing estimators.

Part 1: Jackknife Bias and Variance

Given a one-dimensional sample x and a statistic selector statistic, generate every leave-one-out sample by removing exactly one observation at a time. Compute the statistic on each leave-one-out sample in original order. Let theta_hat be the statistic on the full sample, and let theta_dot be the average of the leave-one-out statistics. Return [[loo_stats], [bias, variance]], where bias = (n - 1) * (theta_dot - theta_hat) and variance = ((n - 1) / n) * sum((theta_i - theta_dot)^2). If x has fewer than 2 observations, return [[], [0.0, 0.0]]. Round every returned number to 6 decimal places. For testing, statistic will be one of "mean", "sum", "min", "max", or "median". A Python implementation may also optionally accept a pandas DataFrame with exactly one numeric column and one row per observation.

Constraints

0 <= len(x) <= 1000
Each element of x is numeric
statistic is one of: "mean", "sum", "min", "max", "median"
If a pandas DataFrame is provided, it has exactly one numeric column

Examples

Input: ([1, 2, 3, 4], "mean")

Expected Output: [[3.0, 2.666667, 2.333333, 2.0], [0.0, 0.416667]]

Explanation: Leave-one-out means are computed in index order. The mean of those values matches the full-sample mean, so the bias is 0.

Input: ([2, 2, 2], "sum")

Expected Output: [[4.0, 4.0, 4.0], [-4.0, 0.0]]

Explanation: Each leave-one-out sample sums to 4, while the full sample sums to 6.

Input: ([5, 7], "mean")

Expected Output: [[7.0, 5.0], [0.0, 1.0]]

Explanation: Removing 5 leaves [7], and removing 7 leaves [5].

Input: ([5], "mean")

Expected Output: [[], [0.0, 0.0]]

Explanation: With fewer than 2 observations, no leave-one-out statistic is defined for this problem.

Input: ([], "sum")

Expected Output: [[], [0.0, 0.0]]

Explanation: Empty input is handled as a boundary case.

Hints

Compute the statistic on the full sample once, then compute it again after removing each index.
The jackknife formulas use the average of the leave-one-out estimates, not the original estimate directly.

Part 2: Weighted Random Choice Sampler

Implement a simplified version of weighted random choice without using numpy. To make the problem deterministic, you are given random_values, a list of numbers in [0, 1), and you must use them as the successive random draws. For each draw, select the first index whose cumulative probability is strictly greater than the current random value. If replace is true, probabilities stay the same for all draws. If replace is false, remove the chosen item, remove its probability, and renormalize the remaining probabilities before the next draw. Return the sampled values in draw order. Return [] for invalid input or if replace is false and k is larger than the number of available values.

Constraints

0 <= len(values) <= 10000
len(values) == len(p)
0 <= k <= 10000
All probabilities in p are non-negative
sum(p) > 0
len(random_values) >= k
If replace is false and k > len(values), return []

Examples

Input: ([10, 20, 30], [0.2, 0.5, 0.3], 5, True, [0.05, 0.2, 0.69, 0.7, 0.99])

Expected Output: [10, 20, 20, 30, 30]

Explanation: Using cumulative probabilities [0.2, 0.7, 1.0], each random number maps to one value.

Input: ([1, 2, 3, 4], [0.1, 0.2, 0.3, 0.4], 3, False, [0.05, 0.5, 0.75])

Expected Output: [1, 3, 4]

Explanation: After each selection, the remaining probabilities are renormalized before the next draw.

Input: ([1, 2, 3], [0.3, 0.3, 0.4], 0, True, [])

Expected Output: []

Explanation: Requesting zero samples should return an empty list.

Input: ([1, 2], [0.5, 0.5], 3, False, [0.1, 0.2, 0.3])

Expected Output: []

Explanation: Sampling without replacement cannot draw more items than exist.

Input: ([7, 8, 9], [0.0, 0.4, 0.6], 3, True, [0.0, 0.39, 0.99])

Expected Output: [8, 8, 9]

Explanation: The value with probability 0 is never selected.

Hints

For sampling with replacement, build one prefix-sum array and binary-search each random number.
For sampling without replacement, remove the chosen item and recompute selection using the remaining weights.

Quick Overview