You are given historical data from a shared city-bike system and asked to predict usage for a specific docking station during the next hour.
Assume you have access to:
-
Hourly trip logs
-
Current dock inventory and dock capacity snapshots
-
Timestamps in the station's local timezone
-
Weather data
-
Holiday and local event indicators
-
Station metadata such as neighborhood and nearby transit stops
Formulate the problem as predicting the number of bike check-outs from one dock in hour t+1.
Describe:
-
How you would define the target and prediction horizon.
-
What features you would engineer.
-
What baseline and more advanced models you would consider.
-
How you would split the data to avoid time leakage.
-
Which evaluation metrics you would use and why.
-
The most likely causes of overfitting in this problem.
-
How you would prevent, detect, and diagnose overfitting.
-
Important edge cases such as cold-start docks, supply constraints, and missing data.