Retrieval And Candidate Generation For Meta Ranking — Tech Interview Concept

What's being tested
Ability to design and evaluate scalable, low-latency candidate-generation systems: choosing retrieval algorithms (sparse/dense/ANN), balancing recall vs latency, handling freshness/cold-start, and defining offline and online metrics.
Core knowledge

Two-stage architecture: cheap candidate generation (high recall) then expensive re-ranking (precision).
Sparse retrieval: inverted index, BM25, popularity/CTR baselines for cold-start.
Dense retrieval: dual-encoder (two-tower) embedding retrieval, trained with in-batch and hard negatives.
ANN systems: FAISS (IVF, PQ), HNSW, tradeoffs between recall, memory, and latency.
MIPS: convert dot-product to ANN-friendly forms or use HNSW/PQ tuned for inner-product.
Evaluation: recall@K, MRR@K, NDCG@K offline; A/B metrics: engagement, retention, long-term value.
Engineering constraints: QPS, 99th-percentile tail latency, incremental indexing, sharding, caching, deduplication.

Worked example — "Design a candidate generation system for a personalized news feed" (framing)
Start by clarifying constraints: QPS, per-request latency budget (e.g., 50–200ms), update frequency (minutes vs seconds), item corpus size, and available signals (user history, content embeddings). Propose a hybrid pipeline: lightweight popularity/BM25 cold-start layer plus dense two-tower retrieval for personalization. Choose ANN index (FAISS HNSW or IVF+PQ) tuned for recall@100 within latency budget; plan incremental indexing or a lightweight streaming layer for fresh items. Define training: two-tower trained with in-batch and hard negatives from logs; evaluate offline on recall@100 and online via A/B testing of engagement and retention. Finally, describe operational considerations: sharding, replication, cache hot items, and monitoring recall drift and freshness.
A common pitfall
Focusing only on offline accuracy (e.g., NDCG gains) while ignoring engineering constraints is tempting. That yields models that can't meet tail-latency, freshness, or QPS requirements. Also avoid assuming dense embeddings alone fix cold-start — popularity/sparse signals and incremental indexing are often necessary in production.
Further reading

Johnson, Douze, Jégou. "Billion-scale similarity search with GPUs" (FAISS).
Malkov & Yashunin. "Efficient and robust approximate nearest neighbor search using HNSW."

Related concepts