Design a scalable recommendation serving system

Q: Design a scalable recommendation serving system

This is a System Design interview question from DoorDash for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Scenario

You are designing the online serving infrastructure for a large-scale recommendation system (e.g., a delivery app or e-commerce feed). The interview is infra-focused, not about model architecture.

Requirements

Serve top- K recommendations for a user on app open / refresh.
Low latency: p50 < 50 ms, p99 < 200 ms (assume typical mobile product expectations).
High QPS (spiky traffic), multi-region support.
Must be resilient to downstream failures (feature store, embedding store, candidate retrieval).
Results should be reasonably fresh (new inventory/items should appear quickly; personalization should reflect recent behavior within minutes).

What to cover

High-level architecture (online request path and offline/batch path).
Candidate generation + ranking services as black boxes (no deep model details), and how they are deployed.
Caching strategy : what to cache, cache keys, TTL/invalidation, and how to avoid staleness/incorrectness.
Scaling strategy : stateless vs stateful components, sharding/partitioning, load balancing, autoscaling.
Data storage choices (feature store/embedding store/item store), consistency expectations, and fallbacks.
Observability: key metrics, logs/traces, and alerting.

Design a scalable recommendation serving system

Scenario

Requirements

What to cover

Solution

Comments (0)