Design a short-video recommender system
Company: LinkedIn
Role: Data Scientist
Category: Machine Learning
Difficulty: easy
Interview Round: Technical Screen
## ML System Design — Short-video recommendation
Design an end-to-end recommendation system for a short-video feed (TikTok/Reels-style). Walk through the full pipeline:
### 1) Objective and constraints
- Define the product goal (e.g., maximize long-term user value).
- State constraints: latency budget, freshness, exploration, safety/compliance.
### 2) Labels and training data
- What labels would you use (watch time, completion, likes, shares, follows, skips)?
- How do you handle **delayed feedback**, **position bias**, and **negative sampling**?
- How do you construct training examples (impressions, sessions, user-video pairs)?
### 3) Features
- User features (history, embeddings, time-of-day, locale)
- Item/video features (content, creator, freshness)
- Context features (device, network, entry point)
- Cross features / interactions
### 4) “Three-stage” recommendation architecture
Describe a standard 3-stage system:
1. **Candidate generation (retrieval)**
2. **Ranking**
3. **Re-ranking / post-processing** (diversity, constraints, business rules)
For each stage:
- Model family choices (e.g., two-tower, GBDT, deep ranker)
- Serving architecture and latency considerations
- How you ensure freshness and handle cold start
### 5) Offline evaluation and model selection
- Which offline metrics would you use and why?
- How do you validate correlation with online metrics?
- How do you avoid offline-to-online mismatch?
### 6) Online experimentation
- A/B test design, primary/guardrail metrics, ramp strategy
- Debugging when offline improves but online regresses
### 7) Safety, robustness, and monitoring
- How to prevent harmful content amplification
- Monitoring (data drift, performance drift, quality regressions)
Be specific and justify trade-offs.
Quick Answer: This question evaluates a data scientist's competency in end-to-end machine learning system design for recommender systems, including retrieval and ranking architecture, feature and label design, offline and online evaluation, serving and latency constraints, and monitoring for safety and robustness.