Scenario
You are designing the online serving infrastructure for a large-scale recommendation system (e.g., a delivery app or e-commerce feed). The interview is infra-focused, not about model architecture.
Requirements
-
Serve top-
K
recommendations for a user on app open / refresh.
-
Low latency: p50 < 50 ms, p99 < 200 ms (assume typical mobile product expectations).
-
High QPS (spiky traffic), multi-region support.
-
Must be resilient to downstream failures (feature store, embedding store, candidate retrieval).
-
Results should be reasonably fresh (new inventory/items should appear quickly; personalization should reflect recent behavior within minutes).
What to cover
-
High-level architecture (online request path and offline/batch path).
-
Candidate generation + ranking services as black boxes (no deep model details), and how they are deployed.
-
Caching strategy
: what to cache, cache keys, TTL/invalidation, and how to avoid staleness/incorrectness.
-
Scaling strategy
: stateless vs stateful components, sharding/partitioning, load balancing, autoscaling.
-
Data storage choices (feature store/embedding store/item store), consistency expectations, and fallbacks.
-
Observability: key metrics, logs/traces, and alerting.