How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Google.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Google during technical interviews.

Minimize L2, L1, and quantile losses | Google Coding Question

Quick Overview

This question evaluates understanding of statistical estimators and optimization principles for L2, L1, and tilted absolute (quantile) loss functions, including use of derivatives and subgradients, order-statistics, algorithmic complexity, and robustness considerations.

Minimize L2, L1, and quantile losses

Company: Google

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Given an array X of n real numbers, derive the value θ that minimizes the sum of squared deviations Σ(xi−θ)² (mean) and the sum of absolute deviations Σ|xi−θ| (median). Without knowing these results, show how to find θ by setting the derivative or subgradient to zero, and give an algorithm to compute each (mean in O(n), median via selection in expected O(n)). Generalize to the 90th percentile by minimizing the tilted absolute loss Σ ρτ(xi−θ) with τ = 0.9, where ρτ(r) = r·(τ − I[r < 0]); characterize the minimizer and discuss tie handling and robustness to outliers.

Quick Answer: This question evaluates understanding of statistical estimators and optimization principles for L2, L1, and tilted absolute (quantile) loss functions, including use of derivatives and subgradients, order-statistics, algorithmic complexity, and robustness considerations.

Part 1: Minimize Squared Deviations

Given a non-empty list xs of real numbers, find the value theta that minimizes the squared-error objective sum((xi - theta)^2) over all xi in xs. The minimizer is unique. Return that minimizing value as a float. Your algorithm should run in one pass over the array.

Constraints

1 <= len(xs) <= 200000
-10^12 <= xs[i] <= 10^12
All xs[i] are finite real numbers.
For non-integer answers, an absolute or relative error of 1e-9 is acceptable.

Examples

Input: ([1, 2, 3, 4],)

Expected Output: 2.5

Explanation: The mean is (1 + 2 + 3 + 4) / 4 = 2.5.

Input: ([42],)

Expected Output: 42.0

Explanation: With one element, the unique minimizer is that element.

Input: ([-3, -1, 2],)

Expected Output: -0.6666666666666666

Explanation: The mean is (-3 - 1 + 2) / 3 = -2/3.

Input: ([1000000000, -1000000000, 3],)

Expected Output: 1.0

Explanation: The large positive and negative values cancel, leaving 3 / 3 = 1.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import math
    return math.fsum(xs) / len(xs)

Time complexity: O(n). Space complexity: O(1).

Hints

Differentiate sum((xi - theta)^2) with respect to theta and set the result to zero.
After simplification, the answer depends only on the sum of the values and the number of values.

Part 2: Minimize Absolute Deviations

Given a non-empty list xs of real numbers, characterize all values theta that minimize the absolute-error objective sum(|xi - theta|). Return a two-element list [L, R], where every theta in the closed interval [L, R] is a minimizer, and no theta outside that interval is a minimizer. For an odd number of elements, L and R will be equal. For an even number of elements, the interval may span the two middle order statistics. Use selection-style logic rather than relying on full sorting.

Constraints

1 <= len(xs) <= 200000
-10^12 <= xs[i] <= 10^12
All xs[i] are finite real numbers.
Expected O(n) time is desired; randomized quickselect is acceptable.

Examples

Input: ([3, 1, 2],)

Expected Output: [2, 2]

Explanation: Sorted order is [1, 2, 3]. The unique median is 2.

Input: ([7],)

Expected Output: [7, 7]

Explanation: A single value is the only minimizer.

Input: ([1, 2, 100, 3],)

Expected Output: [2, 3]

Explanation: Sorted order is [1, 2, 3, 100]. Any theta in [2, 3] minimizes the L1 loss.

Input: ([5, 5, 1, 9, 5, 2],)

Expected Output: [5, 5]

Explanation: Sorted order is [1, 2, 5, 5, 5, 9]. Both middle order statistics are 5.

Input: ([-2, -1, 4, 10],)

Expected Output: [-1, 4]

Explanation: For an even-length array, the L1 objective is flat between the two middle values.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import random

    def quickselect(values, k):
        arr = list(values)
        while True:
            pivot = random.choice(arr)
            lows = []
            highs = []
            pivots = []
            for v in arr:
                if v < pivot:
                    lows.append(v)
                elif v > pivot:
                    highs.append(v)
                else:
                    pivots.append(v)
            if k < len(lows):
                arr = lows
            elif k < len(lows) + len(pivots):
                return pivot
            else:
                k -= len(lows) + len(pivots)
                arr = highs

    n = len(xs)
    left_index = (n - 1) // 2
    right_index = n // 2
    left = quickselect(xs, left_index)
    right = quickselect(xs, right_index)
    return [left, right]

Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).

Hints

The subgradient of sum(|xi - theta|) depends on how many points are less than, equal to, and greater than theta.
The endpoints of the minimizer interval are the two middle order statistics: indices (n - 1) // 2 and n // 2 in sorted order.

Part 3: Minimize Tilted Absolute Loss for the 90th Percentile

Given a non-empty list xs of real numbers, consider the tilted absolute loss with tau = 0.9: rho_tau(r) = r * (tau - I[r < 0]), where r = xi - theta. Find all theta values that minimize sum(rho_tau(xi - theta)). Return a two-element list [L, R], where every theta in [L, R] is a minimizer. This interval is the empirical 90th-percentile minimizer set. If 0.9 * n is not an integer, the interval collapses to one order statistic. If 0.9 * n is an integer, the loss is flat between two adjacent order statistics. Duplicate values may also collapse the interval. This quantile is much less sensitive to extreme high outliers than the mean, as long as fewer than about 10% of the data are affected.

Constraints

1 <= len(xs) <= 200000
-10^12 <= xs[i] <= 10^12
All xs[i] are finite real numbers.
Use tau = 9/10 exactly for rank calculations.
Expected O(n) time is desired; randomized quickselect is acceptable.

Examples

Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],)

Expected Output: [9, 10]

Explanation: Here 0.9 * n = 9 exactly, so the loss is flat between the 9th and 10th sorted values.

Input: ([5],)

Expected Output: [5, 5]

Explanation: With one value, the only possible 90th-percentile minimizer is that value.

Input: ([1, 2, 3, 4, 100],)

Expected Output: [100, 100]

Explanation: Here 0.9 * 5 = 4.5, so the minimizer is the ceil(4.5) = 5th sorted value.

Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10000],)

Expected Output: [10, 10]

Explanation: Here 0.9 * 11 = 9.9, so the 10th sorted value is selected; the single largest outlier does not change it.

Input: ([1, 2, 3, 4, 5, 5, 5, 5, 5, 100],)

Expected Output: [5, 100]

Explanation: Here 0.9 * n = 9 exactly. The 9th sorted value is 5 and the 10th is 100, so every theta in [5, 100] minimizes the tilted loss.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import random

    def quickselect(values, k):
        arr = list(values)
        while True:
            pivot = random.choice(arr)
            lows = []
            highs = []
            pivots = []
            for v in arr:
                if v < pivot:
                    lows.append(v)
                elif v > pivot:
                    highs.append(v)
                else:
                    pivots.append(v)
            if k < len(lows):
                arr = lows
            elif k < len(lows) + len(pivots):
                return pivot
            else:
                k -= len(lows) + len(pivots)
                arr = highs

    n = len(xs)
    scaled = 9 * n
    low_rank = (scaled + 9) // 10
    high_rank = scaled // 10 + 1
    low = quickselect(xs, low_rank - 1)
    high = quickselect(xs, high_rank - 1)
    return [low, high]

Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).

Hints

At a minimizer theta, the counts must satisfy: number of values less than theta <= 0.9n <= number of values less than or equal to theta.
In 1-based sorted order, the minimizer interval is [x_(ceil(0.9n)), x_(floor(0.9n) + 1)]. Use integer arithmetic with 9n / 10.

Quick Overview

Minimize L2, L1, and quantile losses

Company: Google

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Part 1: Minimize Squared Deviations

Constraints

1 <= len(xs) <= 200000
-10^12 <= xs[i] <= 10^12
All xs[i] are finite real numbers.
For non-integer answers, an absolute or relative error of 1e-9 is acceptable.

Examples

Input: ([1, 2, 3, 4],)

Expected Output: 2.5

Explanation: The mean is (1 + 2 + 3 + 4) / 4 = 2.5.

Input: ([42],)

Expected Output: 42.0

Explanation: With one element, the unique minimizer is that element.

Input: ([-3, -1, 2],)

Expected Output: -0.6666666666666666

Explanation: The mean is (-3 - 1 + 2) / 3 = -2/3.

Input: ([1000000000, -1000000000, 3],)

Expected Output: 1.0

Explanation: The large positive and negative values cancel, leaving 3 / 3 = 1.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import math
    return math.fsum(xs) / len(xs)

Time complexity: O(n). Space complexity: O(1).

Hints

Differentiate sum((xi - theta)^2) with respect to theta and set the result to zero.
After simplification, the answer depends only on the sum of the values and the number of values.

Part 2: Minimize Absolute Deviations

Constraints

1 <= len(xs) <= 200000
-10^12 <= xs[i] <= 10^12
All xs[i] are finite real numbers.
Expected O(n) time is desired; randomized quickselect is acceptable.

Examples

Input: ([3, 1, 2],)

Expected Output: [2, 2]

Explanation: Sorted order is [1, 2, 3]. The unique median is 2.

Input: ([7],)

Expected Output: [7, 7]

Explanation: A single value is the only minimizer.

Input: ([1, 2, 100, 3],)

Expected Output: [2, 3]

Explanation: Sorted order is [1, 2, 3, 100]. Any theta in [2, 3] minimizes the L1 loss.

Input: ([5, 5, 1, 9, 5, 2],)

Expected Output: [5, 5]

Explanation: Sorted order is [1, 2, 5, 5, 5, 9]. Both middle order statistics are 5.

Input: ([-2, -1, 4, 10],)

Expected Output: [-1, 4]

Explanation: For an even-length array, the L1 objective is flat between the two middle values.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import random

    def quickselect(values, k):
        arr = list(values)
        while True:
            pivot = random.choice(arr)
            lows = []
            highs = []
            pivots = []
            for v in arr:
                if v < pivot:
                    lows.append(v)
                elif v > pivot:
                    highs.append(v)
                else:
                    pivots.append(v)
            if k < len(lows):
                arr = lows
            elif k < len(lows) + len(pivots):
                return pivot
            else:
                k -= len(lows) + len(pivots)
                arr = highs

    n = len(xs)
    left_index = (n - 1) // 2
    right_index = n // 2
    left = quickselect(xs, left_index)
    right = quickselect(xs, right_index)
    return [left, right]

Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).

Hints

The subgradient of sum(|xi - theta|) depends on how many points are less than, equal to, and greater than theta.
The endpoints of the minimizer interval are the two middle order statistics: indices (n - 1) // 2 and n // 2 in sorted order.

Part 3: Minimize Tilted Absolute Loss for the 90th Percentile

Constraints

1 <= len(xs) <= 200000
-10^12 <= xs[i] <= 10^12
All xs[i] are finite real numbers.
Use tau = 9/10 exactly for rank calculations.
Expected O(n) time is desired; randomized quickselect is acceptable.

Examples

Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],)

Expected Output: [9, 10]

Explanation: Here 0.9 * n = 9 exactly, so the loss is flat between the 9th and 10th sorted values.

Input: ([5],)

Expected Output: [5, 5]

Explanation: With one value, the only possible 90th-percentile minimizer is that value.

Input: ([1, 2, 3, 4, 100],)

Expected Output: [100, 100]

Explanation: Here 0.9 * 5 = 4.5, so the minimizer is the ceil(4.5) = 5th sorted value.

Input: ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10000],)

Expected Output: [10, 10]

Explanation: Here 0.9 * 11 = 9.9, so the 10th sorted value is selected; the single largest outlier does not change it.

Input: ([1, 2, 3, 4, 5, 5, 5, 5, 5, 100],)

Expected Output: [5, 100]

Explanation: Here 0.9 * n = 9 exactly. The 9th sorted value is 5 and the 10th is 100, so every theta in [5, 100] minimizes the tilted loss.

Solution

def solution(xs):
    if not xs:
        raise ValueError('xs must be non-empty')
    import random

    def quickselect(values, k):
        arr = list(values)
        while True:
            pivot = random.choice(arr)
            lows = []
            highs = []
            pivots = []
            for v in arr:
                if v < pivot:
                    lows.append(v)
                elif v > pivot:
                    highs.append(v)
                else:
                    pivots.append(v)
            if k < len(lows):
                arr = lows
            elif k < len(lows) + len(pivots):
                return pivot
            else:
                k -= len(lows) + len(pivots)
                arr = highs

    n = len(xs)
    scaled = 9 * n
    low_rank = (scaled + 9) // 10
    high_rank = scaled // 10 + 1
    low = quickselect(xs, low_rank - 1)
    high = quickselect(xs, high_rank - 1)
    return [low, high]

Time complexity: Expected O(n); worst-case O(n^2) for randomized quickselect. Space complexity: O(n).

Hints

At a minimizer theta, the counts must satisfy: number of values less than theta <= 0.9n <= number of values less than or equal to theta.
In 1-based sorted order, the minimizer interval is [x_(ceil(0.9n)), x_(floor(0.9n) + 1)]. Use integer arithmetic with 9n / 10.