PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Two Sigma

Median, Variance, and Linear Trend of a Daily Temperature Dataset

Last updated: Jul 2, 2026

Median, Variance, and Linear Trend of a Daily Temperature Dataset

Company: Two Sigma

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Take-home Project

# Median, Variance, and Linear Trend of a Daily Temperature Dataset You are given `n` daily temperature readings collected in New York City. The data is a list of records `[day, temp]` where: - `day` is an integer day index. All day values are **distinct**, but the list is **not necessarily sorted**. - `temp` is the temperature reading for that day, a floating-point number. You are also given an integer `q` — a query day index, which may lie outside the observed range of days. Compute the following four things and return them in order: 1. **Median temperature.** Sort the temperatures. If `n` is odd, the median is the middle value; if `n` is even, it is the arithmetic mean of the two middle values. 2. **Sample variance of the temperatures**, using the `n - 1` denominator: $$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \bar{y})^2$$ where $y_i$ are the temperatures and $\bar{y}$ is their mean. (`n >= 2` is guaranteed.) 3. **Ordinary least-squares simple linear regression** of temperature on day index — the slope `b` and intercept `a` of the line `temp = a + b * day` that minimizes the sum of squared residuals: $$b = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}, \qquad a = \bar{y} - b\,\bar{x}$$ where $x_i$ are the day indices. Because all day values are distinct and `n >= 2`, the denominator is never zero. 4. **Predicted temperature for the query day**: $\hat{y} = a + b \cdot q$. ## Input - `records`: a list of `n` pairs `[day, temp]` with distinct integer `day` values and float `temp` values. - `q`: an integer query day index. ## Output - A list of five floats: `[median, sample_variance, slope, intercept, prediction]`. ## Constraints - `2 <= n <= 10^5` - `0 <= day <= 10^6`, all `day` values distinct - `-100.0 <= temp <= 150.0` - `0 <= q <= 2 * 10^6` - Answers within an absolute error of `10^-4` of the reference values are accepted. ## Example 1 Input: ``` records = [[0, 30.0], [1, 34.0], [2, 38.0], [3, 42.0]] q = 5 ``` Output: ``` [36.0, 26.666667, 4.0, 30.0, 50.0] ``` Explanation: Sorted temperatures are `[30, 34, 38, 42]`, so the median is `(34 + 38) / 2 = 36.0`. The mean is `36.0`, and the sample variance is `(36 + 4 + 4 + 36) / 3 = 26.666667`. The best-fit line is `temp = 30.0 + 4.0 * day`, so the prediction for day `5` is `50.0`. ## Example 2 Input: ``` records = [[2, 50.0], [0, 54.0]] q = 1 ``` Output: ``` [52.0, 8.0, -2.0, 54.0, 52.0] ``` Explanation: The median of `[50, 54]` is `52.0` and the sample variance is `((54 - 52)^2 + (50 - 52)^2) / 1 = 8.0`. The regression line through the two points is `temp = 54.0 - 2.0 * day`, giving a prediction of `52.0` for day `1`.

Related Interview Questions

  • Implement Price-Time Order Matching - Two Sigma (medium)
  • Compute Piecewise Linear Interpolation - Two Sigma (medium)
  • Implement an In-Memory Database - Two Sigma (hard)
  • Merge two sorted linked lists - Two Sigma (hard)
  • Merge Two Sorted Lists - Two Sigma (hard)
|Home/Coding & Algorithms/Two Sigma

Median, Variance, and Linear Trend of a Daily Temperature Dataset

Two Sigma logo
Two Sigma
May 13, 2025, 12:00 AM
easyData ScientistTake-home ProjectCoding & Algorithms
0
0

Median, Variance, and Linear Trend of a Daily Temperature Dataset

You are given n daily temperature readings collected in New York City. The data is a list of records [day, temp] where:

  • day is an integer day index. All day values are distinct , but the list is not necessarily sorted .
  • temp is the temperature reading for that day, a floating-point number.

You are also given an integer q — a query day index, which may lie outside the observed range of days.

Compute the following four things and return them in order:

  1. Median temperature. Sort the temperatures. If n is odd, the median is the middle value; if n is even, it is the arithmetic mean of the two middle values.
  2. Sample variance of the temperatures , using the n - 1 denominator: s2=1n−1∑i=1n(yi−yˉ)2s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \bar{y})^2s2=n−11​∑i=1n​(yi​−yˉ​)2 where yiy_iyi​ are the temperatures and yˉ\bar{y}yˉ​ is their mean. ( n >= 2 is guaranteed.)
  3. Ordinary least-squares simple linear regression of temperature on day index — the slope b and intercept a of the line temp = a + b * day that minimizes the sum of squared residuals: b=∑i=1n(xi−xˉ)(yi−yˉ)∑i=1n(xi−xˉ)2,a=yˉ−b xˉb = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}, \qquad a = \bar{y} - b\,\bar{x}b=∑i=1n​(xi​−xˉ)2∑i=1n​(xi​−xˉ)(yi​−yˉ​)​,a=yˉ​−bxˉ where xix_ixi​ are the day indices. Because all day values are distinct and n >= 2 , the denominator is never zero.
  4. Predicted temperature for the query day : y^=a+b⋅q\hat{y} = a + b \cdot qy^​=a+b⋅q .

Input

  • records : a list of n pairs [day, temp] with distinct integer day values and float temp values.
  • q : an integer query day index.

Output

  • A list of five floats: [median, sample_variance, slope, intercept, prediction] .

Constraints

  • 2 <= n <= 10^5
  • 0 <= day <= 10^6 , all day values distinct
  • -100.0 <= temp <= 150.0
  • 0 <= q <= 2 * 10^6
  • Answers within an absolute error of 10^-4 of the reference values are accepted.

Example 1

Input:

records = [[0, 30.0], [1, 34.0], [2, 38.0], [3, 42.0]]
q = 5

Output:

[36.0, 26.666667, 4.0, 30.0, 50.0]

Explanation: Sorted temperatures are [30, 34, 38, 42], so the median is (34 + 38) / 2 = 36.0. The mean is 36.0, and the sample variance is (36 + 4 + 4 + 36) / 3 = 26.666667. The best-fit line is temp = 30.0 + 4.0 * day, so the prediction for day 5 is 50.0.

Example 2

Input:

records = [[2, 50.0], [0, 54.0]]
q = 1

Output:

[52.0, 8.0, -2.0, 54.0, 52.0]

Explanation: The median of [50, 54] is 52.0 and the sample variance is ((54 - 52)^2 + (50 - 52)^2) / 1 = 8.0. The regression line through the two points is temp = 54.0 - 2.0 * day, giving a prediction of 52.0 for day 1.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Two Sigma•More Data Scientist•Two Sigma Data Scientist•Two Sigma Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.