Design real-time live-stream recommendations
Company: Twitch
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
Design a real-time recommendation system for live streams. Address the following:
1) Objective and labels: Define your primary objective (e.g., maximize post-click 10-minute watch probability) and construct training labels from view_events, including delayed outcomes and censoring when streams end.
2) Features: List user, creator, stream, and context features; handle cold-start for new creators and users. Explain how you’d incorporate short-term session signals and long-term embeddings.
3) Model and loss: Choose a model family (e.g., two-tower retrieval + re-ranker with calibrated probabilities). Specify the loss, negative sampling strategy (in-batch + hard negatives), and how you’d address severe class imbalance.
4) Feedback loops and bias: Mitigate position bias and popularity bias via counterfactual estimation (IPS/DR) or randomized exploration. Describe your exploration policy (e.g., Thompson Sampling or UCB) and safety constraints.
5) Evaluation: Define offline metrics (AUC, PR-AUC, calibration, NDCG@k) and online metrics (watch_time/viewer, session_length, bounce_rate). Explain reliable offline→online correlation using replay evaluation and interleaving.
6) Serving: Provide an end-to-end latency budget (<100 ms p95) including retrieval, feature fetch, and ranking. Describe feature freshness (streamer going live) and how you update embeddings in near-real time.
Quick Answer: This question evaluates competency in designing real-time recommender systems within the Machine Learning domain, covering objectives and label construction for streaming events, feature engineering across user/creator/stream/context, model and loss choices, bias mitigation, evaluation metrics, and low-latency serving and freshness constraints.