Design a scalable recommendation serving system

Q: Design a scalable recommendation serving system

This question evaluates an engineer's ability to design scalable, low-latency, and resilient online recommendation serving infrastructure, emphasizing caching strategies, partitioning/sharding, data storage consistency, and observability.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Q: What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Onsite rounds at DoorDash.

Q: What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at DoorDash during technical interviews.

Question

Scenario

You are designing the online serving infrastructure for a large-scale recommendation system (e.g., a delivery app or e-commerce feed). The interview is infra-focused, not about model architecture.

Requirements

Serve top- K recommendations for a user on app open / refresh.
Low latency: p50 < 50 ms, p99 < 200 ms (assume typical mobile product expectations).
High QPS (spiky traffic), multi-region support.
Must be resilient to downstream failures (feature store, embedding store, candidate retrieval).
Results should be reasonably fresh (new inventory/items should appear quickly; personalization should reflect recent behavior within minutes).

What to cover

High-level architecture (online request path and offline/batch path).
Candidate generation + ranking services as black boxes (no deep model details), and how they are deployed.
Caching strategy : what to cache, cache keys, TTL/invalidation, and how to avoid staleness/incorrectness.
Scaling strategy : stateless vs stateful components, sharding/partitioning, load balancing, autoscaling.
Data storage choices (feature store/embedding store/item store), consistency expectations, and fallbacks.
Observability: key metrics, logs/traces, and alerting.

Design a scalable recommendation serving system

Quick Overview

Scenario

Requirements

What to cover

Solution

Comments (0)