PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates competency in statistical data processing and numerical computation, focusing on median and sample variance estimation, ordinary least-squares linear regression (slope and intercept), and point prediction from unsorted time-indexed temperature readings.

  • easy
  • Two Sigma
  • Coding & Algorithms
  • Data Scientist

Median, Variance, and Linear Trend of a Daily Temperature Dataset

Company: Two Sigma

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Take-home Project

# Median, Variance, and Linear Trend of a Daily Temperature Dataset You are given `n` daily temperature readings collected in New York City. The data is a list of records `[day, temp]` where: - `day` is an integer day index. All day values are **distinct**, but the list is **not necessarily sorted**. - `temp` is the temperature reading for that day, a floating-point number. You are also given an integer `q` — a query day index, which may lie outside the observed range of days. Compute the following four things and return them in order: 1. **Median temperature.** Sort the temperatures. If `n` is odd, the median is the middle value; if `n` is even, it is the arithmetic mean of the two middle values. 2. **Sample variance of the temperatures**, using the `n - 1` denominator: $$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \bar{y})^2$$ where $y_i$ are the temperatures and $\bar{y}$ is their mean. (`n >= 2` is guaranteed.) 3. **Ordinary least-squares simple linear regression** of temperature on day index — the slope `b` and intercept `a` of the line `temp = a + b * day` that minimizes the sum of squared residuals: $$b = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}, \qquad a = \bar{y} - b\,\bar{x}$$ where $x_i$ are the day indices. Because all day values are distinct and `n >= 2`, the denominator is never zero. 4. **Predicted temperature for the query day**: $\hat{y} = a + b \cdot q$. ## Input - `records`: a list of `n` pairs `[day, temp]` with distinct integer `day` values and float `temp` values. - `q`: an integer query day index. ## Output - A list of five floats: `[median, sample_variance, slope, intercept, prediction]`. ## Constraints - `2 <= n <= 10^5` - `0 <= day <= 10^6`, all `day` values distinct - `-100.0 <= temp <= 150.0` - `0 <= q <= 2 * 10^6` - Answers within an absolute error of `10^-4` of the reference values are accepted. ## Example 1 Input: ``` records = [[0, 30.0], [1, 34.0], [2, 38.0], [3, 42.0]] q = 5 ``` Output: ``` [36.0, 26.666667, 4.0, 30.0, 50.0] ``` Explanation: Sorted temperatures are `[30, 34, 38, 42]`, so the median is `(34 + 38) / 2 = 36.0`. The mean is `36.0`, and the sample variance is `(36 + 4 + 4 + 36) / 3 = 26.666667`. The best-fit line is `temp = 30.0 + 4.0 * day`, so the prediction for day `5` is `50.0`. ## Example 2 Input: ``` records = [[2, 50.0], [0, 54.0]] q = 1 ``` Output: ``` [52.0, 8.0, -2.0, 54.0, 52.0] ``` Explanation: The median of `[50, 54]` is `52.0` and the sample variance is `((54 - 52)^2 + (50 - 52)^2) / 1 = 8.0`. The regression line through the two points is `temp = 54.0 - 2.0 * day`, giving a prediction of `52.0` for day `1`.

Quick Answer: This question evaluates competency in statistical data processing and numerical computation, focusing on median and sample variance estimation, ordinary least-squares linear regression (slope and intercept), and point prediction from unsorted time-indexed temperature readings.

You are given `n` daily temperature readings collected in New York City as a list of records `[day, temp]`: - `day` is an integer day index. All day values are **distinct**, but the list is **not necessarily sorted**. - `temp` is the temperature reading for that day, a floating-point number. You are also given an integer `q` — a query day index, which may lie outside the observed range of days. Compute the following four things and return them in order: 1. **Median temperature.** Sort the temperatures. If `n` is odd, the median is the middle value; if `n` is even, it is the arithmetic mean of the two middle values. 2. **Sample variance of the temperatures**, using the `n - 1` denominator: `s^2 = (1/(n-1)) * sum((y_i - ybar)^2)`, where `y_i` are the temperatures and `ybar` is their mean (`n >= 2` is guaranteed). 3. **Ordinary least-squares simple linear regression** of temperature on day index — the slope `b` and intercept `a` of the line `temp = a + b * day` that minimizes the sum of squared residuals: `b = sum((x_i - xbar)(y_i - ybar)) / sum((x_i - xbar)^2)` and `a = ybar - b * xbar`, where `x_i` are the day indices. Because all day values are distinct and `n >= 2`, the denominator is never zero. 4. **Predicted temperature for the query day**: `yhat = a + b * q`. **Input:** `records` — a list of `n` pairs `[day, temp]` with distinct integer `day` values and float `temp` values; `q` — an integer query day index. **Output:** A list of five floats `[median, sample_variance, slope, intercept, prediction]`. Answers within an absolute error of `1e-4` of the reference values are accepted.

Constraints

  • 2 <= n <= 10^5
  • 0 <= day <= 10^6, all day values distinct
  • -100.0 <= temp <= 150.0
  • 0 <= q <= 2 * 10^6
  • Answers within an absolute error of 1e-4 of the reference values are accepted

Examples

Input: ([[0, 30.0], [1, 34.0], [2, 38.0], [3, 42.0]], 5)

Expected Output: [36.0, 26.666666666666668, 4.0, 30.0, 50.0]

Explanation: Sorted temps [30, 34, 38, 42] give median (34+38)/2 = 36.0. Mean 36.0, so variance = (36+4+4+36)/3 = 26.6667. Best-fit line temp = 30 + 4*day, so prediction at day 5 is 50.0.

Input: ([[2, 50.0], [0, 54.0]], 1)

Expected Output: [52.0, 8.0, -2.0, 54.0, 52.0]

Explanation: Median of [50, 54] is 52.0; variance = ((54-52)^2 + (50-52)^2)/1 = 8.0. Line through the two points is temp = 54 - 2*day, giving 52.0 at day 1. Note the list is unsorted by day.

Input: ([[5, 10.0], [1, 2.0], [3, 6.0]], 10)

Expected Output: [6.0, 16.0, 2.0, 0.0, 20.0]

Explanation: Odd n=3: sorted temps [2, 6, 10] give median 6.0. Mean 6.0, variance = (16+0+16)/2 = 16.0. The points lie exactly on temp = 2*day, so slope 2, intercept 0, prediction at day 10 is 20.0.

Input: ([[0, -40.0], [4, -20.0], [2, -25.0], [6, -10.0]], 8)

Expected Output: [-22.5, 156.25, 4.75, -38.0, 0.0]

Explanation: Negative temps with unsorted days. Sorted temps [-40, -25, -20, -10] give median (-25 + -20)/2 = -22.5. Mean -23.75, variance = 468.75/3 = 156.25. OLS gives slope 4.75, intercept -38.0, prediction at day 8 = 0.0.

Input: ([[100, 20.0], [50, 20.0]], 999)

Expected Output: [20.0, 0.0, 0.0, 20.0, 20.0]

Explanation: n=2 minimum with equal temps: median 20.0, variance 0.0. All temps identical means the best-fit line is horizontal (slope 0, intercept 20), so the prediction for any query day (999) is 20.0.

Hints

  1. Median only needs the sorted temperatures — the day indices don't matter for it. Sorting a copy of the temps (O(n log n)) is enough; a linear-time selection is optional.
  2. Sample variance uses the n-1 denominator (Bessel's correction), not n. Compute the temperature mean first, then sum the squared deviations.
  3. For the OLS slope, compute the means xbar (of days) and ybar (of temps) in one pass, then b = sum((x-xbar)(y-ybar)) / sum((x-xbar)^2). The intercept is a = ybar - b*xbar, and the prediction is simply a + b*q.
  4. The regression uses day as the independent variable x and temp as the dependent variable y — don't swap them. The median/variance use only y (temps).
Last updated: Jul 2, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Price-Time Order Matching - Two Sigma (medium)
  • Compute Piecewise Linear Interpolation - Two Sigma (medium)
  • Implement an In-Memory Database - Two Sigma (hard)
  • Merge two sorted linked lists - Two Sigma (hard)
  • Merge Two Sorted Lists - Two Sigma (hard)