Recommender And Ranking System Design — Tech Interview Concept

What's being tested
Ability to design scalable, low-latency recommendation/ranking pipelines that balance short‑term engagement and long‑term value, using appropriate algorithms, metrics, and evaluation (offline vs online). Expect tradeoffs across candidate generation, scoring, feature freshness, and experimentation.
Core knowledge

Typical pipeline: candidate generation → coarse scoring → fine-grained ranking → re-ranking/filters.
Algorithms: two‑tower (DSSM), matrix factorization/ALS, BPR, pairwise ranking, GBDTs, neural ranking (YouTube DNN).
Key metrics: precision@k, recall@k, NDCG, MRR, CTR, DAU/retention, and calibration/position bias adjustments.
Exploration vs exploitation: contextual bandits, Thompson sampling, epsilon‑greedy for serendipity and cold start.
Engineering constraints: 100–300ms budget, embedding table memory, feature freshness, incremental model updates.
Bias correction: propensity scoring, inverse propensity weighting, debiasing for logged feedback.
A/B testing: power, slicing, guardrail metrics, and online counterfactual estimators (IPW, doubly robust).

Worked example — "Design a social feed recommender"
Start by scoping: user scale, latency budget, and primary metric (e.g., 7‑day retention vs immediate CTR). Sketch pipeline: retrieval (user/item embeddings, interest graph) → lightweight scorer to reduce candidates → heavy neural ranker with cross features → business/IA filters. Enumerate features (recent activity, social graph, content embeddings, time decay), offline metrics (NDCG, offline CTR), and online strategy (A/B tests vs bandit experiments). Finally list operational constraints: embedding storage, incremental retraining, and safe-fail experiments.
A common pitfall
Candidates often optimize immediate engagement (CTR) without modeling long‑term outcomes, producing clickbait and retention degradation. Another tempting error is treating offline AUC as proxy for online impact—ignoring position/exposure bias and distributional shift from training logs. Always tie objectives to business/long‑term user value and plan debiasing and online validation.
Further reading

Covington, Adams, and Sargin, "Deep Neural Networks for YouTube Recommendations" (RecSys 2016) — production two‑stage candidate+ranking architecture.
Rendle, "BPR: Bayesian Personalized Ranking" (UAI 2009) — pairwise loss for implicit feedback ranking.

Related concepts