Recommenders Under Cold Start and Bias
Asked of: Data Scientist
Last updated

-
What it is Recommenders under cold start and bias refers to two intertwined challenges: making good suggestions when users/items have little or no interaction data, and avoiding systematic skews (e.g., popularity or position bias) baked into logs and models. Together they can suppress new users or items and create “rich-get-richer” feedback loops.
-
Why interviewers ask about it At companies like Meta, recommender work touches Explore/Feed/Ads, where ramping new creators, products, or ad campaigns quickly—and fairly—matters. Data Scientists are expected to balance short‑term CTR with coverage, fairness, and reliable offline evaluation despite biased logs.
-
Core ideas to know
- Cold start types: new user, new item, new context/domain; each needs different mitigation.
- Use content/metadata and two‑tower retrieval to recommend new items before interactions exist.
- Popularity/exposure bias amplifies already‑popular content, harming long‑tail coverage; watch feedback loops.
- Counter bias with exploration (epsilon‑greedy, Thompson), creator caps, diversity constraints, and re‑ranking.
- Correct offline evaluation for selection/position bias via IPS, SNIPS, or doubly robust estimators; log propensities.
- Track beyond‑accuracy metrics: coverage, Gini, exposure parity, time‑to‑first‑impression for new items/users.
- Bootstrap labels: pretrain on content, warm‑start from similar items, synthetic negatives; validate drift.
-
A common pitfall Candidates often propose “just use matrix factorization” or “collect more data,” ignoring that selection/position bias makes offline AUC deceptively rosy. Others fixate on accuracy while starving new creators because popularity feedback is unchecked. Some describe exploration bandits but forget guardrails (budget, caps, safety) or how to evaluate changes offline with counterfactual estimators. Strong answers tie mitigation to product metrics and concrete logging requirements.
-
Further reading
- Scaling the Instagram Explore recommendations system (Meta Engineering) — Real two‑stage retrieval/ranking at scale; shows practical constraints and signals relevant to cold start. (engineering.fb.com)
- Implement two‑tower retrieval for large‑scale candidate generation (Google Cloud) — Reference architecture; explains using content embeddings and vector search to mitigate cold start. (cloud.google.com)
- A Survey on Popularity Bias in Recommender Systems (Springer, 2024) — Up‑to‑date taxonomy, metrics, and mitigation strategies for exposure/popularity bias. (link.springer.com)