PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Two Sigma

Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed)

Last updated: Jul 2, 2026

Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed)

Company: Two Sigma

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Take-home Project

# Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed) You are given readings from a single sensor as a list of records. Each record is a pair `[timestamp, value]` where: - `timestamp` is an integer. - `value` is either a floating-point number or `null` (meaning the reading is missing). The records are **not guaranteed to be sorted**, and the **same timestamp may appear more than once** (duplicate records). Clean the series and fill in every missing value, in the following steps: 1. **Merge duplicate timestamps.** If a timestamp appears in multiple records: - If at least one of its records has a known (non-`null`) value, the merged value for that timestamp is the **arithmetic mean of all known values** recorded at that timestamp. - If **all** of its records are `null`, the merged result is a single missing point at that timestamp. 2. **Sort** the merged points by timestamp in ascending order. 3. **Interpolate interior gaps.** For each missing point at time `t` that lies strictly between two known points, let `(t1, v1)` be the nearest known point with `t1 < t` and `(t2, v2)` be the nearest known point with `t2 > t`. Fill it with the linear interpolation $$v = v_1 + (v_2 - v_1) \cdot \frac{t - t_1}{t_2 - t_1}$$ 4. **Fill the boundaries by holding the nearest known value.** Any missing point that comes before the first known point takes the value of the first known point; any missing point after the last known point takes the value of the last known point. Return the cleaned series as a list of `[timestamp, value]` pairs sorted by ascending timestamp — exactly one pair per distinct timestamp, with every value filled. ## Input - `records`: a list of `n` pairs `[timestamp, value]`, where `value` is a float or `null`. ## Output - A list of `[timestamp, value]` pairs, one per distinct timestamp, sorted by ascending timestamp, with all values filled as floats. ## Constraints - `1 <= n <= 10^5` - `-10^9 <= timestamp <= 10^9` - Known values satisfy `-10^6 <= value <= 10^6` - At least one record has a known (non-`null`) value. - Answers within an absolute error of `10^-6` of the reference values are accepted. ## Example 1 Input: ``` records = [[3, null], [1, 10.0], [5, 20.0], [3, null]] ``` Output: ``` [[1, 10.0], [3, 15.0], [5, 20.0]] ``` Explanation: The two duplicate records at timestamp `3` are both missing, so they merge into a single missing point. After sorting, the missing point at `t = 3` lies between `(1, 10.0)` and `(5, 20.0)`, so it is filled with `10.0 + (20.0 - 10.0) * (3 - 1) / (5 - 1) = 15.0`. ## Example 2 Input: ``` records = [[2, 4.0], [2, 6.0], [4, null], [6, 8.0], [0, null]] ``` Output: ``` [[0, 5.0], [2, 5.0], [4, 6.5], [6, 8.0]] ``` Explanation: Timestamp `2` appears twice with known values `4.0` and `6.0`, which merge to their mean `5.0`. The missing point at `t = 0` is before the first known point, so it holds the first known value `5.0`. The missing point at `t = 4` is interpolated between `(2, 5.0)` and `(6, 8.0)`: `5.0 + (8.0 - 5.0) * (4 - 2) / (6 - 2) = 6.5`.

Related Interview Questions

  • Implement Price-Time Order Matching - Two Sigma (medium)
  • Compute Piecewise Linear Interpolation - Two Sigma (medium)
  • Implement an In-Memory Database - Two Sigma (hard)
  • Merge two sorted linked lists - Two Sigma (hard)
  • Merge Two Sorted Lists - Two Sigma (hard)
|Home/Coding & Algorithms/Two Sigma

Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed)

Two Sigma logo
Two Sigma
May 13, 2025, 12:00 AM
easyData ScientistTake-home ProjectCoding & Algorithms
0
0

Fill Missing Time-Series Values with Linear Interpolation (Duplicate Timestamps Allowed)

You are given readings from a single sensor as a list of records. Each record is a pair [timestamp, value] where:

  • timestamp is an integer.
  • value is either a floating-point number or null (meaning the reading is missing).

The records are not guaranteed to be sorted, and the same timestamp may appear more than once (duplicate records).

Clean the series and fill in every missing value, in the following steps:

  1. Merge duplicate timestamps. If a timestamp appears in multiple records:
    • If at least one of its records has a known (non- null ) value, the merged value for that timestamp is the arithmetic mean of all known values recorded at that timestamp.
    • If all of its records are null , the merged result is a single missing point at that timestamp.
  2. Sort the merged points by timestamp in ascending order.
  3. Interpolate interior gaps. For each missing point at time t that lies strictly between two known points, let (t1, v1) be the nearest known point with t1 < t and (t2, v2) be the nearest known point with t2 > t . Fill it with the linear interpolation v=v1+(v2−v1)⋅t−t1t2−t1v = v_1 + (v_2 - v_1) \cdot \frac{t - t_1}{t_2 - t_1}v=v1​+(v2​−v1​)⋅t2​−t1​t−t1​​
  4. Fill the boundaries by holding the nearest known value. Any missing point that comes before the first known point takes the value of the first known point; any missing point after the last known point takes the value of the last known point.

Return the cleaned series as a list of [timestamp, value] pairs sorted by ascending timestamp — exactly one pair per distinct timestamp, with every value filled.

Input

  • records : a list of n pairs [timestamp, value] , where value is a float or null .

Output

  • A list of [timestamp, value] pairs, one per distinct timestamp, sorted by ascending timestamp, with all values filled as floats.

Constraints

  • 1 <= n <= 10^5
  • -10^9 <= timestamp <= 10^9
  • Known values satisfy -10^6 <= value <= 10^6
  • At least one record has a known (non- null ) value.
  • Answers within an absolute error of 10^-6 of the reference values are accepted.

Example 1

Input:

records = [[3, null], [1, 10.0], [5, 20.0], [3, null]]

Output:

[[1, 10.0], [3, 15.0], [5, 20.0]]

Explanation: The two duplicate records at timestamp 3 are both missing, so they merge into a single missing point. After sorting, the missing point at t = 3 lies between (1, 10.0) and (5, 20.0), so it is filled with 10.0 + (20.0 - 10.0) * (3 - 1) / (5 - 1) = 15.0.

Example 2

Input:

records = [[2, 4.0], [2, 6.0], [4, null], [6, 8.0], [0, null]]

Output:

[[0, 5.0], [2, 5.0], [4, 6.5], [6, 8.0]]

Explanation: Timestamp 2 appears twice with known values 4.0 and 6.0, which merge to their mean 5.0. The missing point at t = 0 is before the first known point, so it holds the first known value 5.0. The missing point at t = 4 is interpolated between (2, 5.0) and (6, 8.0): 5.0 + (8.0 - 5.0) * (4 - 2) / (6 - 2) = 6.5.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Two Sigma•More Data Scientist•Two Sigma Data Scientist•Two Sigma Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.