Design a scalable recommendation serving system
Company: DoorDash
Role: Machine Learning Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
## Scenario
You are designing the **online serving infrastructure** for a large-scale recommendation system (e.g., a delivery app or e-commerce feed). The interview is **infra-focused**, not about model architecture.
## Requirements
- Serve top-*K* recommendations for a user on app open / refresh.
- Low latency: p50 < 50 ms, p99 < 200 ms (assume typical mobile product expectations).
- High QPS (spiky traffic), multi-region support.
- Must be resilient to downstream failures (feature store, embedding store, candidate retrieval).
- Results should be reasonably fresh (new inventory/items should appear quickly; personalization should reflect recent behavior within minutes).
## What to cover
1. High-level architecture (online request path and offline/batch path).
2. Candidate generation + ranking services as black boxes (no deep model details), and how they are deployed.
3. **Caching strategy**: what to cache, cache keys, TTL/invalidation, and how to avoid staleness/incorrectness.
4. **Scaling strategy**: stateless vs stateful components, sharding/partitioning, load balancing, autoscaling.
5. Data storage choices (feature store/embedding store/item store), consistency expectations, and fallbacks.
6. Observability: key metrics, logs/traces, and alerting.
Quick Answer: This question evaluates an engineer's ability to design scalable, low-latency, and resilient online recommendation serving infrastructure, emphasizing caching strategies, partitioning/sharding, data storage consistency, and observability.