How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Capital One.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Capital One during technical interviews.

Critique and Test Python Preprocessing Utilities Effectively

Quick Overview

This question evaluates a candidate's ability to review and critique Python preprocessing utilities, covering competencies in software design patterns (fit/transform separation), coding style and import hygiene, and unit-test specification for data-science pipelines.

Company: Capital One

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Scenario Code review of Python preprocessing utilities (OutlierHandler and three Imputer classes) ##### Question Summarize, at a high level, what the OutlierHandler class accomplishes. Why is keeping fit and transform as separate methods advantageous? Identify and critique any coding-style or import problems found in the three Imputer classes. Hand-write one critical unit test you would add for OutlierHandler. ##### Hints Think scikit-learn patterns, PEP-8, avoiding "from numpy import *", assert expected vs. actual.

Quick Answer: This question evaluates a candidate's ability to review and critique Python preprocessing utilities, covering competencies in software design patterns (fit/transform separation), coding style and import hygiene, and unit-test specification for data-science pipelines.

In many preprocessing pipelines, a utility first learns statistics from training data (`fit`) and later applies them to any dataset (`transform`). Implement a simplified OutlierHandler. Your function receives two lists: - `train`: the data used to learn outlier thresholds - `values`: the data to transform Rules: 1. Ignore `None` values in `train`. 2. Sort the remaining values and compute quartiles using this definition: - If the number of values is odd, exclude the overall median when splitting into lower and upper halves. - `Q1` is the median of the lower half. - `Q3` is the median of the upper half. - The median of an even-length list is the average of the two middle values. 3. Compute `IQR = Q3 - Q1`. 4. Compute bounds: - `lower = Q1 - 1.5 * IQR` - `upper = Q3 + 1.5 * IQR` 5. Transform each element in `values`: - keep `None` as `None` - if a number is below `lower`, replace it with `lower` - if a number is above `upper`, replace it with `upper` - otherwise leave it unchanged If `train` contains fewer than 2 non-`None` values, return `values` unchanged. Return the transformed list.

Constraints

0 <= len(train), len(values) <= 100000
Each element is either an integer, a float, or None
Quartiles must be computed from `train` only, never from `values`

Examples

Input: ([10, 12, 13, 14, 15, 16, 18, 19, 100], [5, 15, 30])

Expected Output: [5, 15, 27.5]

Explanation: From the training data, Q1 = 12.5 and Q3 = 18.5, so IQR = 6. The bounds are 3.5 and 27.5. Only 30 is above the upper bound, so it is clipped to 27.5.

Input: ([None, 1, 2, 2, 3, 4, 100], [None, -5, 8, 4])

Expected Output: [None, -1.0, 7.0, 4]

Explanation: Ignoring None, the sorted training data is [1, 2, 2, 3, 4, 100]. Q1 = 2, Q3 = 4, so IQR = 2 and the bounds are -1 and 7. None stays None, -5 becomes -1, 8 becomes 7, and 4 stays unchanged.

Input: ([None, 7], [100, None, -3])

Expected Output: [100, None, -3]

Explanation: There is only one non-None training value, so the function must return `values` unchanged.

Input: ([2, 2, 2, 2], [2, 3, 1])

Expected Output: [2, 2.0, 2.0]

Explanation: Q1 = Q3 = 2, so IQR = 0 and both bounds are exactly 2. Any value not equal to 2 is clipped to 2.

Input: ([], [])

Expected Output: []

Explanation: Empty training data has fewer than 2 usable values, so the output is the input `values`, which is also empty.

Solution

def solution(train, values):
    def median(arr):
        n = len(arr)
        mid = n // 2
        if n % 2 == 1:
            return arr[mid]
        return (arr[mid - 1] + arr[mid]) / 2

    clean = sorted(x for x in train if x is not None)

    if len(clean) < 2:
        return list(values)

    n = len(clean)
    mid = n // 2

    if n % 2 == 1:
        lower_half = clean[:mid]
        upper_half = clean[mid + 1:]
    else:
        lower_half = clean[:mid]
        upper_half = clean[mid:]

    q1 = median(lower_half)
    q3 = median(upper_half)
    iqr = q3 - q1
    lower = q1 - 1.5 * iqr
    upper = q3 + 1.5 * iqr

    result = []
    for x in values:
        if x is None:
            result.append(None)
        elif x < lower:
            result.append(lower)
        elif x > upper:
            result.append(upper)
        else:
            result.append(x)

    return result

Time complexity: O(n log n + m), where n = len(train) and m = len(values). Space complexity: O(n).

Hints

Write a small helper function to compute the median of a sorted list, then reuse it for Q1 and Q3.
This is a fit/transform pattern: learn the clipping bounds from `train` once, then apply them to every element in `values`.

Quick Overview