PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of statistical estimators and optimization principles for L2, L1, and tilted absolute (quantile) loss functions, including use of derivatives and subgradients, order-statistics, algorithmic complexity, and robustness considerations.

  • Medium
  • Google
  • Coding & Algorithms
  • Data Scientist

Minimize L2, L1, and quantile losses

Company: Google

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Given an array X of n real numbers, derive the value θ that minimizes the sum of squared deviations Σ(xi−θ)² (mean) and the sum of absolute deviations Σ|xi−θ| (median). Without knowing these results, show how to find θ by setting the derivative or subgradient to zero, and give an algorithm to compute each (mean in O(n), median via selection in expected O(n)). Generalize to the 90th percentile by minimizing the tilted absolute loss Σ ρτ(xi−θ) with τ = 0.9, where ρτ(r) = r·(τ − I[r < 0]); characterize the minimizer and discuss tie handling and robustness to outliers.

Quick Answer: This question evaluates understanding of statistical estimators and optimization principles for L2, L1, and tilted absolute (quantile) loss functions, including use of derivatives and subgradients, order-statistics, algorithmic complexity, and robustness considerations.

Part 1: Minimize Squared Deviations

Given a non-empty list xs of real numbers, find the value theta that minimizes the squared-error objective sum((xi - theta)^2) over all xi in xs. The minimizer is unique. Return that minimizing value as a float. Your algorithm should run in one pass over the array.

Constraints

  • 1 <= len(xs) <= 200000
  • -10^12 <= xs[i] <= 10^12
  • All xs[i] are finite real numbers.
  • For non-integer answers, an absolute or relative error of 1e-9 is acceptable.

Examples

Input: ([1, 2, 3, 4],)

Expected Output: 2.5

Explanation: The mean is (1 + 2 + 3 + 4) / 4 = 2.5.

Input: ([42],)

Expected Output: 42.0

Explanation: With one element, the unique minimizer is that element.

Input: ([-3, -1, 2],)

Expected Output: -0.6666666666666666

Explanation: The mean is (-3 - 1 + 2) / 3 = -2/3.

Input: ([1000000000, -1000000000, 3],)

Expected Output: 1.0

Explanation: The large positive and negative values cancel, leaving 3 / 3 = 1.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import math
    return math.fsum(xs) / len(xs)

Time complexity: O(n). Space complexity: O(1).

Hints

  1. Differentiate sum((xi - theta)^2) with respect to theta and set the result to zero.
  2. After simplification, the answer depends only on the sum of the values and the number of values.

Part 2: Minimize Absolute Deviations

Given a non-empty list xs of real numbers, characterize all values theta that minimize the absolute-error objective sum(|xi - theta|). Return a two-element list [L, R], where every theta in the closed interval [L, R] is a minimizer, and no theta outside that interval is a minimizer. For an odd number of elements, L and R will be equal. For an even number of elements, the interval may span the two middle order statistics. Use selection-style logic rather than relying on full sorting.

Constraints

  • 1 <= len(xs) <= 200000
  • -10^12 <= xs[i] <= 10^12
  • All xs[i] are finite real numbers.
  • Expected O(n) time is desired; randomized quickselect is acceptable.

Examples

Input: ([3, 1, 2],)

Expected Output: [2, 2]

Explanation: Sorted order is [1, 2, 3]. The unique median is 2.

Input: ([7],)

Expected Output: [7, 7]

Explanation: A single value is the only minimizer.

Input: ([1, 2, 100, 3],)

Expected Output: [2, 3]

Explanation: Sorted order is [1, 2, 3, 100]. Any theta in [2, 3] minimizes the L1 loss.

Input: ([5, 5, 1, 9, 5, 2],)

Expected Output: [5, 5]

Explanation: Sorted order is [1, 2, 5, 5, 5, 9]. Both middle order statistics are 5.

Input: ([-2, -1, 4, 10],)

Expected Output: [-1, 4]

Explanation: For an even-length array, the L1 objective is flat between the two middle values.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import random

    def quickselect(values, k):
        arr = list(values)
        while True:
            pivot = random.choice(arr)
            lows = []
            highs = []
            pivots = []
            for v in arr:
                if v < pivot:
                    lows.append(v)
                elif v > pivot:
                    highs.append(v)
                else:
                    pivots.append(v)
            if k < len(lows):
                arr = lows
            elif k < len(lows) + len(pivots):
                return pivot
            else:
                k -= len(lows) + len(pivots)
                arr = highs

    n = len(xs)
    left_index = (n - 1) // 2
    right_index = n // 2
    left = quickselect(xs, left_index)
    right = quickselect(xs, right_index)
    return [left, right]

Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).

Hints

  1. The subgradient of sum(|xi - theta|) depends on how many points are less than, equal to, and greater than theta.
  2. The endpoints of the minimizer interval are the two middle order statistics: indices (n - 1) // 2 and n // 2 in sorted order.

Part 3: Minimize Tilted Absolute Loss for the 90th Percentile

Given a non-empty list xs of real numbers, consider the tilted absolute loss with tau = 0.9: rho_tau(r) = r * (tau - I[r < 0]), where r = xi - theta. Find all theta values that minimize sum(rho_tau(xi - theta)). Return a two-element list [L, R], where every theta in [L, R] is a minimizer. This interval is the empirical 90th-percentile minimizer set. If 0.9 * n is not an integer, the interval collapses to one order statistic. If 0.9 * n is an integer, the loss is flat between two adjacent order statistics. Duplicate values may also collapse the interval. This quantile is much less sensitive to extreme high outliers than the mean, as long as fewer than about 10% of the data are affected.

Constraints

  • 1 <= len(xs) <= 200000
  • -10^12 <= xs[i] <= 10^12
  • All xs[i] are finite real numbers.
  • Use tau = 9/10 exactly for rank calculations.
  • Expected O(n) time is desired; randomized quickselect is acceptable.

Examples

Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],)

Expected Output: [9, 10]

Explanation: Here 0.9 * n = 9 exactly, so the loss is flat between the 9th and 10th sorted values.

Input: ([5],)

Expected Output: [5, 5]

Explanation: With one value, the only possible 90th-percentile minimizer is that value.

Input: ([1, 2, 3, 4, 100],)

Expected Output: [100, 100]

Explanation: Here 0.9 * 5 = 4.5, so the minimizer is the ceil(4.5) = 5th sorted value.

Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10000],)

Expected Output: [10, 10]

Explanation: Here 0.9 * 11 = 9.9, so the 10th sorted value is selected; the single largest outlier does not change it.

Input: ([1, 2, 3, 4, 5, 5, 5, 5, 5, 100],)

Expected Output: [5, 100]

Explanation: Here 0.9 * n = 9 exactly. The 9th sorted value is 5 and the 10th is 100, so every theta in [5, 100] minimizes the tilted loss.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import random

    def quickselect(values, k):
        arr = list(values)
        while True:
            pivot = random.choice(arr)
            lows = []
            highs = []
            pivots = []
            for v in arr:
                if v < pivot:
                    lows.append(v)
                elif v > pivot:
                    highs.append(v)
                else:
                    pivots.append(v)
            if k < len(lows):
                arr = lows
            elif k < len(lows) + len(pivots):
                return pivot
            else:
                k -= len(lows) + len(pivots)
                arr = highs

    n = len(xs)
    scaled = 9 * n
    low_rank = (scaled + 9) // 10
    high_rank = scaled // 10 + 1
    low = quickselect(xs, low_rank - 1)
    high = quickselect(xs, high_rank - 1)
    return [low, high]

Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).

Hints

  1. At a minimizer theta, the counts must satisfy: number of values less than theta <= 0.9n <= number of values less than or equal to theta.
  2. In 1-based sorted order, the minimizer interval is [x_(ceil(0.9n)), x_(floor(0.9n) + 1)]. Use integer arithmetic with 9n / 10.
Last updated: Jun 15, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Solve Rooms and Top-K Streams - Google (medium)
  • Find Containing Range - Google (medium)
  • Rearrange Tasks With Cooldown - Google (medium)
  • Implement Employee Management and Expression Evaluation - Google (medium)
  • Solve Three Array and Matrix Path Problems - Google (medium)