Median, Variance, and Linear Trend of a Daily Temperature Dataset
Company: Two Sigma
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Take-home Project
Company: Two Sigma
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Take-home Project
You are given n daily temperature readings collected in New York City. The data is a list of records [day, temp] where:
day
is an integer day index. All day values are
distinct
, but the list is
not necessarily sorted
.
temp
is the temperature reading for that day, a floating-point number.
You are also given an integer q — a query day index, which may lie outside the observed range of days.
Compute the following four things and return them in order:
n
is odd, the median is the middle value; if
n
is even, it is the arithmetic mean of the two middle values.
n - 1
denominator:
where
are the temperatures and
is their mean. (
n >= 2
is guaranteed.)
b
and intercept
a
of the line
temp = a + b * day
that minimizes the sum of squared residuals:
where
are the day indices. Because all day values are distinct and
n >= 2
, the denominator is never zero.
records
: a list of
n
pairs
[day, temp]
with distinct integer
day
values and float
temp
values.
q
: an integer query day index.
[median, sample_variance, slope, intercept, prediction]
.
2 <= n <= 10^5
0 <= day <= 10^6
, all
day
values distinct
-100.0 <= temp <= 150.0
0 <= q <= 2 * 10^6
10^-4
of the reference values are accepted.
Input:
records = [[0, 30.0], [1, 34.0], [2, 38.0], [3, 42.0]]
q = 5
Output:
[36.0, 26.666667, 4.0, 30.0, 50.0]
Explanation: Sorted temperatures are [30, 34, 38, 42], so the median is (34 + 38) / 2 = 36.0. The mean is 36.0, and the sample variance is (36 + 4 + 4 + 36) / 3 = 26.666667. The best-fit line is temp = 30.0 + 4.0 * day, so the prediction for day 5 is 50.0.
Input:
records = [[2, 50.0], [0, 54.0]]
q = 1
Output:
[52.0, 8.0, -2.0, 54.0, 52.0]
Explanation: The median of [50, 54] is 52.0 and the sample variance is ((54 - 52)^2 + (50 - 52)^2) / 1 = 8.0. The regression line through the two points is temp = 54.0 - 2.0 * day, giving a prediction of 52.0 for day 1.