Implement Jackknife and Random Choice
Company: Pinterest
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of statistical resampling (jackknife), probabilistic sampling algorithms including with- and without-replacement selection, and proficiency with Python data structures and numerical libraries for implementing estimators.
Part 1: Jackknife Bias and Variance
Constraints
- 0 <= len(x) <= 1000
- Each element of x is numeric
- statistic is one of: "mean", "sum", "min", "max", "median"
- If a pandas DataFrame is provided, it has exactly one numeric column
Examples
Input: ([1, 2, 3, 4], "mean")
Expected Output: [[3.0, 2.666667, 2.333333, 2.0], [0.0, 0.416667]]
Explanation: Leave-one-out means are computed in index order. The mean of those values matches the full-sample mean, so the bias is 0.
Input: ([2, 2, 2], "sum")
Expected Output: [[4.0, 4.0, 4.0], [-4.0, 0.0]]
Explanation: Each leave-one-out sample sums to 4, while the full sample sums to 6.
Input: ([5, 7], "mean")
Expected Output: [[7.0, 5.0], [0.0, 1.0]]
Explanation: Removing 5 leaves [7], and removing 7 leaves [5].
Input: ([5], "mean")
Expected Output: [[], [0.0, 0.0]]
Explanation: With fewer than 2 observations, no leave-one-out statistic is defined for this problem.
Input: ([], "sum")
Expected Output: [[], [0.0, 0.0]]
Explanation: Empty input is handled as a boundary case.
Hints
- Compute the statistic on the full sample once, then compute it again after removing each index.
- The jackknife formulas use the average of the leave-one-out estimates, not the original estimate directly.
Part 2: Weighted Random Choice Sampler
Constraints
- 0 <= len(values) <= 10000
- len(values) == len(p)
- 0 <= k <= 10000
- All probabilities in p are non-negative
- sum(p) > 0
- len(random_values) >= k
- If replace is false and k > len(values), return []
Examples
Input: ([10, 20, 30], [0.2, 0.5, 0.3], 5, True, [0.05, 0.2, 0.69, 0.7, 0.99])
Expected Output: [10, 20, 20, 30, 30]
Explanation: Using cumulative probabilities [0.2, 0.7, 1.0], each random number maps to one value.
Input: ([1, 2, 3, 4], [0.1, 0.2, 0.3, 0.4], 3, False, [0.05, 0.5, 0.75])
Expected Output: [1, 3, 4]
Explanation: After each selection, the remaining probabilities are renormalized before the next draw.
Input: ([1, 2, 3], [0.3, 0.3, 0.4], 0, True, [])
Expected Output: []
Explanation: Requesting zero samples should return an empty list.
Input: ([1, 2], [0.5, 0.5], 3, False, [0.1, 0.2, 0.3])
Expected Output: []
Explanation: Sampling without replacement cannot draw more items than exist.
Input: ([7, 8, 9], [0.0, 0.4, 0.6], 3, True, [0.0, 0.39, 0.99])
Expected Output: [8, 8, 9]
Explanation: The value with probability 0 is never selected.
Hints
- For sampling with replacement, build one prefix-sum array and binary-search each random number.
- For sampling without replacement, remove the chosen item and recompute selection using the remaining weights.