ML System Design: Thumbnail Selection for a Streaming Catalog
Company: Tubitv
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Technical Screen
# ML System Design: Thumbnail Selection for a Streaming Catalog
You work at a video streaming service. For each title in the catalog (movie or show), there are multiple candidate thumbnail images — for example, frames automatically sampled from the video, plus a few editorially produced artwork variants. Given a title and its set of candidate thumbnails, design a machine learning system that selects which thumbnail to show each user so as to maximize engagement (the user clicking the title and starting to watch).
Design the end-to-end system: how you frame the problem, what data and labels you use, the model, how you evaluate it offline and online, how you serve it at scale, and how you monitor it in production. Discuss whether and how you would personalize the choice per user versus picking one globally best thumbnail per title.
### Constraints & Assumptions
- Catalog on the order of $10^5$ titles; each title has roughly 5-30 candidate thumbnails.
- Tens of millions of users; the home/browse screen renders many titles per impression.
- Thumbnail must be chosen at serve time within the page-render latency budget (single-digit to low tens of milliseconds for the ranking/selection step).
- New titles and freshly generated candidate thumbnails appear continuously (cold start).
- Primary engagement signal involves the click and subsequent viewing behavior; raw clicks alone may not fully capture genuine interest.
### Clarifying Questions to Ask
- What engagement signal should the system optimize — and what are the risks of optimizing a coarser vs. a more nuanced signal?
- Is the thumbnail choice per-user personalized, or should there be one globally winning thumbnail per title? What is the appetite for personalization complexity?
- How are candidate thumbnails generated, and how many per title? Is there an editorial or brand constraint on which images are eligible?
- What is the serving latency budget for the thumbnail decision within the page render?
- How quickly must a brand-new title or a new candidate image start being shown well (cold-start expectations)?
- Are there fairness or quality guardrails (e.g., no misleading frames, content-appropriateness)?
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- Your logged data only contains feedback for the thumbnail that was actually shown. How do you train or evaluate a model that wants to reason about thumbnails that were *never* shown for a given user? (Discuss exploration and off-policy/counterfactual evaluation.)
- A naive click-maximizing model starts surfacing sensational, slightly misleading frames that get clicks but low completion. How do you detect this and change the objective to prevent it?
- A brand-new title enters the catalog with five never-seen thumbnails and zero engagement data. Walk through exactly how the system behaves for the first hours and days.
- How would you decide whether per-user personalization is actually worth the added complexity over a single global best thumbnail per title? What experiment would settle it?
Quick Answer: This ML system design question tests a candidate's ability to architect an end-to-end personalization and ranking system, covering problem framing, reward modeling, and low-latency serving at scale. It evaluates practical knowledge of contextual bandits, engagement-signal design, cold-start handling, and off-policy evaluation — core competencies for machine learning engineering roles.