Implement Sampling and Minimize Loss in Numerical Coding
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
##### Scenario
Numerical coding challenges on sampling and loss minimization.
##### Question
a) Implement functions to sample from truncated normal distributions for x>1, 4<x<4.05, and x>4. b) For an array X, find the value minimizing Σ(x−θ)², then the value minimizing Σ|x−θ|, and derive the loss that yields the 90th percentile.
##### Hints
Use rejection or CDF-inverse methods; derivatives show mean, median, and quantile solutions.
Quick Answer: This question evaluates proficiency in probability and numerical methods, focusing on sampling from truncated distributions and analytical properties of estimators under various loss functions.
Given an array X of n real numbers and a mode, return the value θ that minimizes a specified loss. Modes: (1) 'L2': minimize Σ(xi−θ)^2; return the arithmetic mean of X. (2) 'L1': minimize Σ|xi−θ|; return the smallest median (lower median when n is even). (3) 'quantile': return the τ-quantile, defined as the smallest θ such that at least τ fraction of elements are ≤ θ (nearest-rank/left quantile). If X is empty, raise ValueError. For mode 'quantile', τ must satisfy 0 < τ ≤ 1 (default τ = 0.9).
Constraints
- 1 <= n <= 200000
- |xi| <= 1e9
- mode in {'L2','L1','quantile'}
- For 'quantile' mode: 0 < tau <= 1
- Return the smallest minimizer if multiple minimizers exist
Hints
- For L2, set derivative of Σ(x−θ)^2 to zero to get θ = mean(X).
- For L1, any median minimizes the sum of absolute deviations; choose the lower median for determinism.
- For the τ-quantile, use k = ceil(τ*n) - 1 on the sorted order; compute k-th order statistic via Quickselect to avoid full sorting.