Design a Personalized Content Recommendation Engine
Company: Bytedance
Role: Software Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Technical Screen
# Design a Personalized Content Recommendation Engine
You are asked to design the recommendation engine that powers the personalized home feed of a large content-sharing platform (think a short-video or article feed). When a user opens the app, the system must return an ordered list of items the user is most likely to engage with, and continuously refresh that list as the user scrolls.
Design this recommendation engine end to end: how you frame it as a machine-learning problem, how you generate and rank candidates from a huge catalog, how you serve recommendations at low latency and high scale, and how you evaluate and monitor the system in production.
### Constraints & Assumptions
- Catalog has 100M+ items and grows continuously — new items are uploaded every second.
- ~100M daily active users; a session loads ~10-20 items at a time and may scroll through hundreds.
- A feed request must return within roughly 100-200 ms at p99.
- Strong long-tail and cold-start pressure: brand-new items and brand-new users appear constantly.
- Available engagement signals: impressions, plays/clicks, watch time, likes, shares, follows, and skips.
- The business goal is long-term engagement / retention, not just maximizing the next click.
### Clarifying Questions to Ask
- What is the primary business objective — day-N retention, total watch time, ad revenue, creator growth — and how do we trade these off against each other?
- What item types are in scope (videos, posts, ads, who-to-follow), and is this a single blended feed or several separate rails?
- What is the operating scale (catalog size, DAU, peak QPS) and the hard latency budget per feed request?
- Which signals and logs are available, and how fresh are they (real-time watch-time events vs daily batch)?
- Are there hard constraints to honor — content-safety / policy filtering, diversity, freshness, or fairness to creators?
- How is the feed consumed (infinite scroll vs paginated) and how often must it be recomputed within a session?
### Part 1 — Problem framing and objective
Frame the recommendation task as a machine-learning problem. What exactly are you predicting, what is the label, and how do you translate the business goal into a trainable objective?
```hint Where to start
Separate the business metric (retention, total watch time) from the per-item proxy label you can actually train on — e.g., the probability of a positive engagement, or predicted watch time for a given user-item pair.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Candidate generation (retrieval)
With 100M+ items you cannot score the whole catalog on every request. Design the candidate-generation stage that narrows the catalog down to a few hundred candidates per request.
```hint Two-stage funnel
Think retrieval then ranking. For retrieval, use approximate nearest-neighbor search over learned embeddings (a two-tower model) alongside a few heuristic sources.
```
```hint Cold start
Use several complementary candidate sources so new items and new users aren't starved: content-based embeddings, trending/fresh, and the follow graph.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Ranking
Given the few hundred retrieved candidates, design the ranking model that orders them for this specific user in this context.
```hint Features
Combine user features, item features, and user×item cross / interaction features; the sequence of the user's most recent in-session actions is an especially strong signal.
```
```hint Multi-objective
A single click-probability score under-serves watch time and diversity. Consider a multi-task model whose heads (click, watch time, like, share) are blended into one final score.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 4 — Serving, scale, and freshness
Describe the online serving architecture that returns a ranked feed within the latency budget at this scale, and explain how features and models stay fresh.
```hint Precompute vs online
Precompute and cache whatever you can offline or near-line (item embeddings, user embeddings, ANN index) so only light per-request work happens on the hot path.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 5 — Evaluation, experimentation, and monitoring
How do you know the system is good, and how do you safely ship changes to it?
```hint Offline vs online
Offline metrics (AUC, NDCG, recall@k) only loosely correlate with the business metric — gate real launches on online A/B tests of the north-star metric.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- How would you serve a brand-new user with zero history on their very first session, and how does the experience evolve over their first few interactions?
- Retrieval and ranking disagree often (good candidates rank low, or weak candidates dominate). How do you debug whether the bottleneck is retrieval or ranking?
- Engagement is up but long-term retention is flat or declining. How do you detect and fix a feedback loop that is over-promoting clickbait?
- How would you introduce a new objective — say creator-side fairness or content diversity — without retraining the entire stack from scratch?
Quick Answer: This question evaluates a candidate's ability to design a large-scale personalized recommendation system, covering problem framing, candidate retrieval, ranking, and serving under tight latency constraints. It tests knowledge of machine learning system design, including multi-objective modeling, cold-start handling, and production evaluation, at a practical, applied level typical of ML system design interviews.