Design a Real-Time Recommendation System for Live Streams
Context: You are designing a recommender for a large live-streaming platform. Assume you have standard logs: impressions, clicks, view_events with join/leave timestamps, and stream metadata. Address the following:
-
Objective and Labels
-
Define your primary objective (e.g., maximize post-click 10-minute watch probability).
-
Construct training labels from view_events, including how to handle delayed outcomes and right-censoring when a stream ends before the label horizon.
-
Features
-
List user, creator, stream, and context features.
-
Explain cold-start handling for new creators and new users.
-
Describe how you would incorporate short-term session signals and long-term embeddings.
-
Model and Loss
-
Choose a model family (e.g., two-tower retrieval + re-ranker with calibrated probabilities).
-
Specify the loss function(s), negative sampling strategy (in-batch + hard negatives), and how to address severe class imbalance.
-
Feedback Loops and Bias
-
Mitigate position bias and popularity bias using counterfactual estimation (IPS/DR) or randomized exploration.
-
Describe your exploration policy (e.g., Thompson Sampling or UCB) and safety constraints.
-
Evaluation
-
Define offline metrics (AUC, PR-AUC, calibration, NDCG@k) and online metrics (watch_time/viewer, session_length, bounce_rate).
-
Explain how to establish reliable offline-to-online correlation using replay evaluation and interleaving.
-
Serving and Freshness
-
Provide an end-to-end latency budget (<100 ms p95) including retrieval, feature fetch, and ranking.
-
Describe feature freshness (e.g., streamer going live) and how you update embeddings in near real time.