Implement Sampling and Minimize Loss in Numerical Coding
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
##### Scenario
Numerical coding challenges on sampling and loss minimization.
##### Question
a) Implement functions to sample from truncated normal distributions for x>1, 4<x<4.05, and x>4. b) For an array X, find the value minimizing Σ(x−θ)², then the value minimizing Σ|x−θ|, and derive the loss that yields the 90th percentile.
##### Hints
Use rejection or CDF-inverse methods; derivatives show mean, median, and quantile solutions.
Quick Answer: This question evaluates proficiency in probability and numerical methods, focusing on sampling from truncated distributions and analytical properties of estimators under various loss functions.
Given an array X of n real numbers and a mode, return the value θ that minimizes a specified loss. Modes: (1) 'L2': minimize Σ(xi−θ)^2; return the arithmetic mean of X. (2) 'L1': minimize Σ|xi−θ|; return the smallest median (lower median when n is even). (3) 'quantile': return the τ-quantile, defined as the smallest θ such that at least τ fraction of elements are ≤ θ (nearest-rank/left quantile). If X is empty, raise ValueError. For mode 'quantile', τ must satisfy 0 < τ ≤ 1 (default τ = 0.9).
Constraints
- 1 <= n <= 200000
- |xi| <= 1e9
- mode in {'L2','L1','quantile'}
- For 'quantile' mode: 0 < tau <= 1
- Return the smallest minimizer if multiple minimizers exist
Solution
from math import ceil
def minimize_loss(X: list, mode: str, tau: float = 0.9) -> float:
if not isinstance(X, list) or len(X) == 0:
raise ValueError("X must be a non-empty list")
n = len(X)
if mode not in ("L2", "L1", "quantile"):
raise ValueError("mode must be one of 'L2', 'L1', 'quantile'")
if mode == "L2":
return sum(X) / n
if mode == "quantile" and not (0 < tau <= 1):
raise ValueError("tau must be in (0, 1] for 'quantile' mode")
# Select k-th order statistic (0-indexed) using Quickselect.
a = list(X) # work on a copy to avoid mutating input
if mode == "L1":
k = (n - 1) // 2 # lower median
else: # mode == 'quantile'
k = int(ceil(tau * n) - 1)
def partition(arr, lo, hi, pivot_index):
pv = arr[pivot_index]
arr[pivot_index], arr[hi] = arr[hi], arr[pivot_index]
store = lo
for i in range(lo, hi):
if arr[i] < pv:
arr[store], arr[i] = arr[i], arr[store]
store += 1
arr[store], arr[hi] = arr[hi], arr[store]
return store
lo, hi = 0, n - 1
while True:
if lo == hi:
return a[lo]
pivot_index = (lo + hi) // 2
p = partition(a, lo, hi, pivot_index)
if k == p:
return a[p]
elif k < p:
hi = p - 1
else:
lo = p + 1