Design a real-time recommendation system

Q: Design a real-time recommendation system

This is a ML System Design interview question from Google for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

You are asked to design a real-time recommendation system for a large-scale consumer product (for example, recommending items or content to users in a mobile app).

The system should:

Serve personalized recommendations with low latency (end-to-end P95 latency target: e.g., < 100 ms from request to response at the service layer).
Support millions of daily active users and tens of thousands of candidate items that change over time.
Continuously incorporate new user interactions (clicks, views, purchases, etc.) to keep recommendations fresh.

Address the following aspects in your design:

High-level Architecture & Data Flow
- Describe the overall pipeline from data generation to model training to online serving.
- Explicitly separate offline , nearline , and online components where applicable.
Feature Engineering & Feature Store
- What kinds of user, item, and context features would you use?
- How would you design a feature store to support both offline training and online inference with consistent features?
Modeling Approach
- Propose a baseline model (e.g., simple heuristics or a shallow model) and then a more advanced model (e.g., deep learning–based ranking).
- Explain how you would structure the system as candidate generation + ranking (or another decomposition) and why.
Cold Start Problem
- How would you handle new users with little or no history?
- How would you handle new items with no interaction data?
- Discuss multiple strategies (e.g., content-based features, popularity-based recommendations, exploration).
Latency vs. Accuracy Trade-offs
- Given a strict latency budget, how would you design the serving path (caching, pre-computation, approximate search, etc.)?
- Discuss concrete strategies to trade off model complexity/accuracy against serving latency and system cost.
- Explain where you would use caching (e.g., user-level, item-level, or result-level caches) and what consistency/TTL strategies you might choose.
Monitoring, Evaluation, and Iteration
- What online metrics and offline metrics would you track to evaluate the recommender system?
- How would you set up A/B testing or other online experiments?
- Describe what you would monitor in production (e.g., model performance drift, feature distribution shift, latency, error rates) and how you would respond.
Scalability, Reliability, and Other Practical Considerations
- Discuss storage and computation choices (e.g., streaming system, message queues, scalable storage for logs, feature store, and models).
- How would you design for fault tolerance, graceful degradation, and fallback behavior (e.g., if the model server is down or too slow)?

Clearly explain your assumptions and walk through your design step by step.

Design a real-time recommendation system

Solution

Comments (0)