You are given historical daily temperature data for New York City and several nearby towns. Each row contains a date, the NYC temperature, and the temperature for each town on that date.
Answer the following:
-
Volatility analysis
-
Determine which town has the largest temperature fluctuation over time.
-
Clearly define the metric you use for fluctuation.
-
Similarity analysis
-
Determine which town's temperature pattern is most similar to NYC's.
-
Clearly define the similarity metric you use.
-
Prediction task
-
Use the towns' temperature data to predict NYC temperature.
-
Train a regression model and evaluate it using mean squared error, or MSE.
-
Greedy feature selection
-
Given a target number
k
, choose
k
towns to use as features.
-
Start with no selected towns.
-
At each step, add the town that gives the largest reduction in validation MSE when combined with the already selected towns.
-
Return the selected towns and the final MSE.
-
No-intercept linear regression
-
Implement simple linear regression without an intercept for a single predictor
x
and target
y
.
-
First solve the batch case, where all data is available at once.
-
Then solve the streaming case, where
(x, y)
pairs arrive one at a time and the current slope must be updated without recomputing from scratch over all past data.