Retrieval And Candidate Generation For Meta Ranking
Asked of: Data Scientist
Last updated
-
What's being tested
Ability to design and evaluate scalable, low-latency candidate-generation systems: choosing retrieval algorithms (sparse/dense/ANN), balancing recall vs latency, handling freshness/cold-start, and defining offline and online metrics. -
Core knowledge
- Two-stage architecture: cheap candidate generation (high recall) then expensive re-ranking (precision).
- Sparse retrieval: inverted index, BM25, popularity/CTR baselines for cold-start.
- Dense retrieval: dual-encoder (two-tower) embedding retrieval, trained with in-batch and hard negatives.
- ANN systems: FAISS (IVF, PQ), HNSW, tradeoffs between recall, memory, and latency.
- MIPS: convert dot-product to ANN-friendly forms or use HNSW/PQ tuned for inner-product.
- Evaluation: recall@K, MRR@K, NDCG@K offline; A/B metrics: engagement, retention, long-term value.
- Engineering constraints: QPS, 99th-percentile tail latency, incremental indexing, sharding, caching, deduplication.
-
Worked example — "Design a candidate generation system for a personalized news feed" (framing)
Start by clarifying constraints: QPS, per-request latency budget (e.g., 50–200ms), update frequency (minutes vs seconds), item corpus size, and available signals (user history, content embeddings). Propose a hybrid pipeline: lightweight popularity/BM25 cold-start layer plus dense two-tower retrieval for personalization. Choose ANN index (FAISS HNSW or IVF+PQ) tuned for recall@100 within latency budget; plan incremental indexing or a lightweight streaming layer for fresh items. Define training: two-tower trained with in-batch and hard negatives from logs; evaluate offline on recall@100 and online via A/B testing of engagement and retention. Finally, describe operational considerations: sharding, replication, cache hot items, and monitoring recall drift and freshness. -
A common pitfall
Focusing only on offline accuracy (e.g., NDCG gains) while ignoring engineering constraints is tempting. That yields models that can't meet tail-latency, freshness, or QPS requirements. Also avoid assuming dense embeddings alone fix cold-start — popularity/sparse signals and incremental indexing are often necessary in production. -
Further reading
- Johnson, Douze, Jégou. "Billion-scale similarity search with GPUs" (FAISS).
- Malkov & Yashunin. "Efficient and robust approximate nearest neighbor search using HNSW."