System Design: Scalable, Privacy-Conscious Recommendations Service for a Consumer App
Context
You are designing a real-time recommendations service for a global consumer app (e.g., marketplace-style content such as listings or experiences). Assume:
-
Scale: 30–50M MAU, 5–10M items, traffic across Americas, EU, and APAC.
-
Surfaces: Home feed (top-N), in-search suggestions (inline), item detail page (similar items), emails/notifications.
-
Privacy: Operates under GDPR/CCPA and regional data residency constraints.
Design the service end-to-end, covering online serving and offline pipelines, while balancing relevance, latency, and privacy.
Requirements to Cover
-
Functional requirements
-
What the service must do, supported surfaces, personalization scope, and fallbacks/degradation behavior.
-
APIs
-
Online serving API for recommendations (request/response contracts).
-
Feedback ingestion APIs (impressions, clicks, conversions, hides).
-
Consent/identity APIs as needed.
-
Data models
-
Core entities: user, item, and interaction events.
-
Feature store schema (online/offline), embeddings, and versioning.
-
Personalization approach
-
Candidate generation and ranking stages.
-
Modeling choices (e.g., collaborative filtering, two-tower retrieval, session-based models).
-
Cold-start strategy
-
New user and new item strategies, exploration, and content-based fallbacks.
-
Ranking and feedback loops
-
Multi-objective ranking (e.g., CTR, conversion, value), calibration, diversity/novelty.
-
How to log and learn from feedback; avoid bias and clickbait.
-
Latency/SLA targets
-
SLOs/SLIs and a latency budget across components, with degrade modes.
-
Caching and storage choices
-
Online stores (vectors, features), caches (edge and per-user), invalidation/TTL.
-
Backfill and reprocessing
-
Event-time processing, replays, feature backfills, and reproducibility.
-
Monitoring and alerting
-
System health, data quality, model performance, drift, and runbooks.
-
Experiments support
-
A/B testing, holdouts, sequential testing/bandits, and guardrails.
-
Abuse/spam prevention
-
Shilling, bot/fraud detection, adversarial content.
-
Regional privacy/consent handling
-
Consent gating, data minimization, data residency, erasure, audit.
Make reasonable assumptions where needed and call them out explicitly.