Predict Bike Dock Demand
Company: Two Sigma
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
You are working on a docked bike-sharing system. Build a model that predicts how many bikes will be checked out from a specific dock in the next hour.
Assume you have access to:
- `trips(trip_id, start_time, start_station_id, end_station_id, user_type)`
- `station_status(station_id, ts, bikes_available, docks_available, capacity)`
- `weather(ts, temperature, precipitation, wind_speed)`
- `calendar(date, is_holiday, is_weekend, special_event)`
Discuss:
1. How you would define the prediction target and unit of analysis.
2. What features you would engineer without leaking future information.
3. What model family you would start with and why.
4. Which evaluation metrics you would use, and how your choice changes if the business cares more about stock-outs than raw count error.
5. How you would split train and validation data for a time-series problem.
6. How you would prevent overfitting and handle cold-start stations, missing data, and distribution shifts such as severe weather or holidays.
Quick Answer: This question evaluates a data scientist's competency in time-series demand forecasting, covering temporal target definition, temporal feature engineering, model family selection, evaluation metric alignment with business priorities, and handling data-quality issues like missing data, cold-starts, and distribution shifts.