How to forecast bike dock demand
Company: Two Sigma
Role: Data Scientist
Category: Machine Learning
Difficulty: easy
Interview Round: Technical Screen
You operate a shared city-bike system. For a given dock (station), you want to **predict demand in the next hour**.
## Task
Design an approach to predict:
- **Target:** number of bike check-outs (or net bike outflow) from this dock in the next **1 hour**.
## Data (assume available)
- Historical trips: `trip_id, start_station_id, end_station_id, start_time, end_time`
- Station metadata: `station_id, lat, lon, capacity`
- Exogenous signals (optional but common): weather, holidays/events, nearby transit, current dock inventory (bikes available / docks available)
## Questions
1. How would you **formulate** the prediction problem (regression vs classification vs time series)?
2. What **features** would you build (time-based, lagged, seasonality, spatial, inventory constraints, weather)?
3. What model families would you consider (baselines to advanced), and how would you **evaluate** them (metrics + train/validation split)?
4. How would you **prevent overfitting** in this setting (feature design, regularization, validation strategy, leakage prevention)?
5. What are key **failure modes / edge cases** (cold-start stations, special events, missing data, concept drift)?
Quick Answer: This question evaluates competency in time-series forecasting and demand prediction, including feature engineering, model selection and evaluation, handling nonstationarity, data leakage, overfitting, and operational failure modes.