Minimize L2, L1, and quantile losses
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of statistical estimators and optimization principles for L2, L1, and tilted absolute (quantile) loss functions, including use of derivatives and subgradients, order-statistics, algorithmic complexity, and robustness considerations.
Part 1: Minimize Squared Deviations
Constraints
- 1 <= len(xs) <= 200000
- -10^12 <= xs[i] <= 10^12
- All xs[i] are finite real numbers.
- For non-integer answers, an absolute or relative error of 1e-9 is acceptable.
Examples
Input: ([1, 2, 3, 4],)
Expected Output: 2.5
Explanation: The mean is (1 + 2 + 3 + 4) / 4 = 2.5.
Input: ([42],)
Expected Output: 42.0
Explanation: With one element, the unique minimizer is that element.
Input: ([-3, -1, 2],)
Expected Output: -0.6666666666666666
Explanation: The mean is (-3 - 1 + 2) / 3 = -2/3.
Input: ([1000000000, -1000000000, 3],)
Expected Output: 1.0
Explanation: The large positive and negative values cancel, leaving 3 / 3 = 1.
Solution
def solution(xs):
if not xs:
raise ValueError('xs must be non-empty')
import math
return math.fsum(xs) / len(xs)Time complexity: O(n). Space complexity: O(1).
Hints
- Differentiate sum((xi - theta)^2) with respect to theta and set the result to zero.
- After simplification, the answer depends only on the sum of the values and the number of values.
Part 2: Minimize Absolute Deviations
Constraints
- 1 <= len(xs) <= 200000
- -10^12 <= xs[i] <= 10^12
- All xs[i] are finite real numbers.
- Expected O(n) time is desired; randomized quickselect is acceptable.
Examples
Input: ([3, 1, 2],)
Expected Output: [2, 2]
Explanation: Sorted order is [1, 2, 3]. The unique median is 2.
Input: ([7],)
Expected Output: [7, 7]
Explanation: A single value is the only minimizer.
Input: ([1, 2, 100, 3],)
Expected Output: [2, 3]
Explanation: Sorted order is [1, 2, 3, 100]. Any theta in [2, 3] minimizes the L1 loss.
Input: ([5, 5, 1, 9, 5, 2],)
Expected Output: [5, 5]
Explanation: Sorted order is [1, 2, 5, 5, 5, 9]. Both middle order statistics are 5.
Input: ([-2, -1, 4, 10],)
Expected Output: [-1, 4]
Explanation: For an even-length array, the L1 objective is flat between the two middle values.
Solution
def solution(xs):
if not xs:
raise ValueError('xs must be non-empty')
import random
def quickselect(values, k):
arr = list(values)
while True:
pivot = random.choice(arr)
lows = []
highs = []
pivots = []
for v in arr:
if v < pivot:
lows.append(v)
elif v > pivot:
highs.append(v)
else:
pivots.append(v)
if k < len(lows):
arr = lows
elif k < len(lows) + len(pivots):
return pivot
else:
k -= len(lows) + len(pivots)
arr = highs
n = len(xs)
left_index = (n - 1) // 2
right_index = n // 2
left = quickselect(xs, left_index)
right = quickselect(xs, right_index)
return [left, right]Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).
Hints
- The subgradient of sum(|xi - theta|) depends on how many points are less than, equal to, and greater than theta.
- The endpoints of the minimizer interval are the two middle order statistics: indices (n - 1) // 2 and n // 2 in sorted order.
Part 3: Minimize Tilted Absolute Loss for the 90th Percentile
Constraints
- 1 <= len(xs) <= 200000
- -10^12 <= xs[i] <= 10^12
- All xs[i] are finite real numbers.
- Use tau = 9/10 exactly for rank calculations.
- Expected O(n) time is desired; randomized quickselect is acceptable.
Examples
Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],)
Expected Output: [9, 10]
Explanation: Here 0.9 * n = 9 exactly, so the loss is flat between the 9th and 10th sorted values.
Input: ([5],)
Expected Output: [5, 5]
Explanation: With one value, the only possible 90th-percentile minimizer is that value.
Input: ([1, 2, 3, 4, 100],)
Expected Output: [100, 100]
Explanation: Here 0.9 * 5 = 4.5, so the minimizer is the ceil(4.5) = 5th sorted value.
Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10000],)
Expected Output: [10, 10]
Explanation: Here 0.9 * 11 = 9.9, so the 10th sorted value is selected; the single largest outlier does not change it.
Input: ([1, 2, 3, 4, 5, 5, 5, 5, 5, 100],)
Expected Output: [5, 100]
Explanation: Here 0.9 * n = 9 exactly. The 9th sorted value is 5 and the 10th is 100, so every theta in [5, 100] minimizes the tilted loss.
Solution
def solution(xs):
if not xs:
raise ValueError('xs must be non-empty')
import random
def quickselect(values, k):
arr = list(values)
while True:
pivot = random.choice(arr)
lows = []
highs = []
pivots = []
for v in arr:
if v < pivot:
lows.append(v)
elif v > pivot:
highs.append(v)
else:
pivots.append(v)
if k < len(lows):
arr = lows
elif k < len(lows) + len(pivots):
return pivot
else:
k -= len(lows) + len(pivots)
arr = highs
n = len(xs)
scaled = 9 * n
low_rank = (scaled + 9) // 10
high_rank = scaled // 10 + 1
low = quickselect(xs, low_rank - 1)
high = quickselect(xs, high_rank - 1)
return [low, high]Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).
Hints
- At a minimizer theta, the counts must satisfy: number of values less than theta <= 0.9n <= number of values less than or equal to theta.
- In 1-based sorted order, the minimizer interval is [x_(ceil(0.9n)), x_(floor(0.9n) + 1)]. Use integer arithmetic with 9n / 10.