Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed)
Company: Two Sigma
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Take-home Project
Company: Two Sigma
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Take-home Project
You are given readings from a single sensor as a list of records. Each record is a pair [timestamp, value] where:
timestamp
is an integer.
value
is either a floating-point number or
null
(meaning the reading is missing).
The records are not guaranteed to be sorted, and the same timestamp may appear more than once (duplicate records).
Clean the series and fill in every missing value, in the following steps:
null
) value, the merged value for that timestamp is the
arithmetic mean of all known values
recorded at that timestamp.
null
, the merged result is a single missing point at that timestamp.
t
that lies strictly between two known points, let
(t1, v1)
be the nearest known point with
t1 < t
and
(t2, v2)
be the nearest known point with
t2 > t
. Fill it with the linear interpolation
Return the cleaned series as a list of [timestamp, value] pairs sorted by ascending timestamp — exactly one pair per distinct timestamp, with every value filled.
records
: a list of
n
pairs
[timestamp, value]
, where
value
is a float or
null
.
[timestamp, value]
pairs, one per distinct timestamp, sorted by ascending timestamp, with all values filled as floats.
1 <= n <= 10^5
-10^9 <= timestamp <= 10^9
-10^6 <= value <= 10^6
null
) value.
10^-6
of the reference values are accepted.
Input:
records = [[3, null], [1, 10.0], [5, 20.0], [3, null]]
Output:
[[1, 10.0], [3, 15.0], [5, 20.0]]
Explanation: The two duplicate records at timestamp 3 are both missing, so they merge into a single missing point. After sorting, the missing point at t = 3 lies between (1, 10.0) and (5, 20.0), so it is filled with 10.0 + (20.0 - 10.0) * (3 - 1) / (5 - 1) = 15.0.
Input:
records = [[2, 4.0], [2, 6.0], [4, null], [6, 8.0], [0, null]]
Output:
[[0, 5.0], [2, 5.0], [4, 6.5], [6, 8.0]]
Explanation: Timestamp 2 appears twice with known values 4.0 and 6.0, which merge to their mean 5.0. The missing point at t = 0 is before the first known point, so it holds the first known value 5.0. The missing point at t = 4 is interpolated between (2, 5.0) and (6, 8.0): 5.0 + (8.0 - 5.0) * (4 - 2) / (6 - 2) = 6.5.