You are designing the candidate-generation (retrieval) and recommendation system for a short-video app.
Constraints and setting:
-
Users can search with a
text query
(e.g., “funny cat fails”), and the system should retrieve relevant
short videos
.
-
Only
~20% of videos have reliable text metadata
(title/description/hashtags). The rest may have only visual/audio signals.
-
You must support low-latency online retrieval at large scale.
Tasks:
-
Propose an end-to-end architecture for
query-to-video retrieval
and how it fits into a full recommender stack (retrieval → ranking → re-ranking).
-
Explain how you would represent videos (multi-modal features) and queries, and how you would handle the 80% of videos without text.
-
Describe offline training, online serving, indexing/ANN choices, and how you would evaluate retrieval quality.
-
Discuss how you would mitigate
popularity bias
in retrieval/recommendation while keeping relevance and engagement strong.