Design short-video retrieval with sparse text
Company: Snapchat
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Technical Screen
You are designing the candidate-generation (retrieval) and recommendation system for a short-video app.
Constraints and setting:
- Users can search with a **text query** (e.g., “funny cat fails”), and the system should retrieve relevant **short videos**.
- Only **~20% of videos have reliable text metadata** (title/description/hashtags). The rest may have only visual/audio signals.
- You must support low-latency online retrieval at large scale.
Tasks:
1) Propose an end-to-end architecture for **query-to-video retrieval** and how it fits into a full recommender stack (retrieval → ranking → re-ranking).
2) Explain how you would represent videos (multi-modal features) and queries, and how you would handle the 80% of videos without text.
3) Describe offline training, online serving, indexing/ANN choices, and how you would evaluate retrieval quality.
4) Discuss how you would mitigate **popularity bias** in retrieval/recommendation while keeping relevance and engagement strong.
Quick Answer: This question evaluates expertise in ML system design, large-scale information retrieval and recommendation engineering, with emphasis on multi-modal video representation, sparse-text handling, indexing/ANN choices, low-latency online serving, and bias mitigation such as popularity bias.