You operate a shared city-bike system. For a given dock (station), you want to predict demand in the next hour.
Task
Design an approach to predict:
-
Target:
number of bike check-outs (or net bike outflow) from this dock in the next
1 hour
.
Data (assume available)
-
Historical trips:
trip_id, start_station_id, end_station_id, start_time, end_time
-
Station metadata:
station_id, lat, lon, capacity
-
Exogenous signals (optional but common): weather, holidays/events, nearby transit, current dock inventory (bikes available / docks available)
Questions
-
How would you
formulate
the prediction problem (regression vs classification vs time series)?
-
What
features
would you build (time-based, lagged, seasonality, spatial, inventory constraints, weather)?
-
What model families would you consider (baselines to advanced), and how would you
evaluate
them (metrics + train/validation split)?
-
How would you
prevent overfitting
in this setting (feature design, regularization, validation strategy, leakage prevention)?
-
What are key
failure modes / edge cases
(cold-start stations, special events, missing data, concept drift)?