How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a easy difficulty Coding & Algorithms question, commonly asked during Take-home Project rounds at Two Sigma.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Two Sigma during technical interviews.

Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed)

Q: Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed)

This question evaluates a candidate's ability to perform time-series data cleaning and aggregation of duplicate timestamps, plus numerical linear interpolation and boundary value propagation, within the Coding & Algorithms domain.

Company: Two Sigma

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Take-home Project

# Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed) You are given readings from a single sensor as a list of records. Each record is a pair `[timestamp, value]` where: - `timestamp` is an integer. - `value` is either a floating-point number or `null` (meaning the reading is missing). The records are **not guaranteed to be sorted**, and the **same timestamp may appear more than once** (duplicate records). Clean the series and fill in every missing value, in the following steps: 1. **Merge duplicate timestamps.** If a timestamp appears in multiple records: - If at least one of its records has a known (non-`null`) value, the merged value for that timestamp is the **arithmetic mean of all known values** recorded at that timestamp. - If **all** of its records are `null`, the merged result is a single missing point at that timestamp. 2. **Sort** the merged points by timestamp in ascending order. 3. **Interpolate interior gaps.** For each missing point at time `t` that lies strictly between two known points, let `(t1, v1)` be the nearest known point with `t1 < t` and `(t2, v2)` be the nearest known point with `t2 > t`. Fill it with the linear interpolation $$v = v_1 + (v_2 - v_1) \cdot \frac{t - t_1}{t_2 - t_1}$$ 4. **Fill the boundaries by holding the nearest known value.** Any missing point that comes before the first known point takes the value of the first known point; any missing point after the last known point takes the value of the last known point. Return the cleaned series as a list of `[timestamp, value]` pairs sorted by ascending timestamp — exactly one pair per distinct timestamp, with every value filled. ## Input - `records`: a list of `n` pairs `[timestamp, value]`, where `value` is a float or `null`. ## Output - A list of `[timestamp, value]` pairs, one per distinct timestamp, sorted by ascending timestamp, with all values filled as floats. ## Constraints - `1 <= n <= 10^5` - `-10^9 <= timestamp <= 10^9` - Known values satisfy `-10^6 <= value <= 10^6` - At least one record has a known (non-`null`) value. - Answers within an absolute error of `10^-6` of the reference values are accepted. ## Example 1 Input: ``` records = [[3, null], [1, 10.0], [5, 20.0], [3, null]] ``` Output: ``` [[1, 10.0], [3, 15.0], [5, 20.0]] ``` Explanation: The two duplicate records at timestamp `3` are both missing, so they merge into a single missing point. After sorting, the missing point at `t = 3` lies between `(1, 10.0)` and `(5, 20.0)`, so it is filled with `10.0 + (20.0 - 10.0) * (3 - 1) / (5 - 1) = 15.0`. ## Example 2 Input: ``` records = [[2, 4.0], [2, 6.0], [4, null], [6, 8.0], [0, null]] ``` Output: ``` [[0, 5.0], [2, 5.0], [4, 6.5], [6, 8.0]] ``` Explanation: Timestamp `2` appears twice with known values `4.0` and `6.0`, which merge to their mean `5.0`. The missing point at `t = 0` is before the first known point, so it holds the first known value `5.0`. The missing point at `t = 4` is interpolated between `(2, 5.0)` and `(6, 8.0)`: `5.0 + (8.0 - 5.0) * (4 - 2) / (6 - 2) = 6.5`.

Quick Answer: This question evaluates a candidate's ability to perform time-series data cleaning and aggregation of duplicate timestamps, plus numerical linear interpolation and boundary value propagation, within the Coding & Algorithms domain.

You are given readings from a single sensor as a list of records. Each record is a pair `[timestamp, value]` where `timestamp` is an integer and `value` is either a floating-point number or `null` (a missing reading). The records are NOT guaranteed to be sorted, and the same timestamp may appear more than once. Clean the series and fill in every missing value: 1. **Merge duplicate timestamps.** If a timestamp appears in multiple records: if at least one record has a known (non-null) value, the merged value is the arithmetic mean of all known values at that timestamp; if all of its records are null, it becomes a single missing point. 2. **Sort** the merged points by ascending timestamp. 3. **Interpolate interior gaps.** For a missing point at time `t` strictly between two known points `(t1, v1)` (nearest known with `t1 < t`) and `(t2, v2)` (nearest known with `t2 > t`), fill it with `v = v1 + (v2 - v1) * (t - t1) / (t2 - t1)`. 4. **Fill boundaries by holding the nearest known value.** Any missing point before the first known point takes the first known value; any missing point after the last known point takes the last known value. Return the cleaned series as a list of `[timestamp, value]` pairs — exactly one pair per distinct timestamp, sorted by ascending timestamp, with every value filled as a float. **Example 1** Input: `records = [[3, null], [1, 10.0], [5, 20.0], [3, null]]` Output: `[[1, 10.0], [3, 15.0], [5, 20.0]]` The two null records at t=3 merge to one missing point, interpolated between (1, 10.0) and (5, 20.0): 10.0 + 10.0*(3-1)/(5-1) = 15.0. **Example 2** Input: `records = [[2, 4.0], [2, 6.0], [4, null], [6, 8.0], [0, null]]` Output: `[[0, 5.0], [2, 5.0], [4, 6.5], [6, 8.0]]` t=2 appears twice (4.0, 6.0) → mean 5.0. t=0 is before the first known point → holds 5.0. t=4 is interpolated between (2, 5.0) and (6, 8.0): 5.0 + 3.0*(4-2)/(6-2) = 6.5.

Constraints

1 <= n <= 10^5
-10^9 <= timestamp <= 10^9
-10^6 <= value <= 10^6 for known values
At least one record has a known (non-null) value.
Answers within an absolute error of 10^-6 of the reference are accepted.

Examples

Input: ([[3, None], [1, 10.0], [5, 20.0], [3, None]],)

Expected Output: [[1, 10.0], [3, 15.0], [5, 20.0]]

Explanation: The two null records at t=3 merge into one missing point, interpolated between (1, 10.0) and (5, 20.0): 10.0 + 10.0*(3-1)/(5-1) = 15.0.

Input: ([[2, 4.0], [2, 6.0], [4, None], [6, 8.0], [0, None]],)

Expected Output: [[0, 5.0], [2, 5.0], [4, 6.5], [6, 8.0]]

Explanation: t=2 averages 4.0 and 6.0 to 5.0. t=0 precedes the first known point so it holds 5.0. t=4 interpolates between (2, 5.0) and (6, 8.0): 5.0 + 3.0*(4-2)/(6-2) = 6.5.

Input: ([[5, 3.0]],)

Expected Output: [[5, 3.0]]

Explanation: A single known point is returned unchanged.

Input: ([[1, 10.0], [1, None], [2, None], [3, 30.0]],)

Expected Output: [[1, 10.0], [2, 20.0], [3, 30.0]]

Explanation: At t=1 the null is ignored, leaving the known value 10.0. The missing point at t=2 interpolates between (1, 10.0) and (3, 30.0): 10.0 + 20.0*(2-1)/(3-1) = 20.0.

Input: ([[0, 5.0], [1, None], [2, None]],)

Expected Output: [[0, 5.0], [1, 5.0], [2, 5.0]]

Explanation: Both missing points come after the last (and only) known point, so they hold its value 5.0.

Input: ([[2, -4.0], [0, None], [-1, None]],)

Expected Output: [[-1, -4.0], [0, -4.0], [2, -4.0]]

Explanation: After sorting, the two missing points precede the only known point (2, -4.0), so they hold -4.0.

Hints

First group records by timestamp: accumulate a running sum and count of only the non-null values per timestamp so you can compute the mean; a timestamp with zero known values stays a missing point.
After sorting the distinct timestamps, record the indices of the points that have a known value. Every missing point falls into exactly one of three buckets: before the first known index, after the last known index, or strictly between two consecutive known indices.
For interior gaps, walk between each pair of consecutive known indices and apply v = v1 + (v2 - v1) * (t - t1) / (t2 - t1); for the boundary buckets simply hold the nearest known value.

Quick Overview

This question evaluates a candidate's ability to perform time-series data cleaning and aggregation of duplicate timestamps, plus numerical linear interpolation and boundary value propagation, within the Coding & Algorithms domain.