System Design: Recommendation Service (Serving + Data Infrastructure)
Context
Design a production-ready recommendation service for a large-scale consumer app. The service should return personalized item recommendations with strict latency SLOs and support continuous model training. Assume millions of users and items, peak thousands of RPS, and multi-region deployment. The design should cover both online serving and offline/batch ML workflows.
Requirements
-
Client data fetching and API design
-
Public, versioned API for fetching recommendations.
-
Logging of user interactions (e.g., view, click, add-to-cart) for training and real-time adaptation.
-
Caching semantics and idempotency where appropriate.
-
Application serving tier
-
Multiple stateless application servers behind a reverse proxy/load balancer.
-
Horizontal scalability, health checks, autoscaling.
-
Caching and latency targets
-
Multi-layer caching strategy (client/edge, mid-tier, feature/model caches).
-
Specify end-to-end latency targets (p50/p95), and a latency budget per component.
-
Storage schemas and data modeling
-
Schemas for users, items, and interaction logs with fields: user_id, item_id, signal, timestamp.
-
Indexing and partitioning strategies for OLTP and OLAP.
-
Offline/online coexistence
-
How batch training reads the historical data.
-
How online serving performs low-latency reads and writes, including near-real-time feature updates.
-
Strategies to ensure feature parity between offline and online environments.
-
Scaling, consistency, and fault tolerance
-
Sharding, replication, and multi-region considerations.
-
Consistency choices (strong vs eventual), read-your-writes where needed.
-
Fault isolation, retries, circuit breakers, fallbacks, and degraded modes.
Provide a clear, componentized architecture and justify key trade-offs.