Design a real-time recommendation system
Company: Google
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Onsite
You are asked to design a **real-time recommendation system** for a large-scale consumer product (for example, recommending items or content to users in a mobile app).
The system should:
- Serve personalized recommendations with **low latency** (end-to-end P95 latency target: e.g., < 100 ms from request to response at the service layer).
- Support **millions of daily active users** and **tens of thousands of candidate items** that change over time.
- Continuously incorporate new user interactions (clicks, views, purchases, etc.) to keep recommendations fresh.
Address the following aspects in your design:
1. **High-level Architecture & Data Flow**
- Describe the overall pipeline from data generation to model training to online serving.
- Explicitly separate **offline**, **nearline**, and **online** components where applicable.
2. **Feature Engineering & Feature Store**
- What kinds of user, item, and context features would you use?
- How would you design a **feature store** to support both offline training and online inference with consistent features?
3. **Modeling Approach**
- Propose a baseline model (e.g., simple heuristics or a shallow model) and then a more advanced model (e.g., deep learning–based ranking).
- Explain how you would structure the system as **candidate generation + ranking** (or another decomposition) and why.
4. **Cold Start Problem**
- How would you handle **new users** with little or no history?
- How would you handle **new items** with no interaction data?
- Discuss multiple strategies (e.g., content-based features, popularity-based recommendations, exploration).
5. **Latency vs. Accuracy Trade-offs**
- Given a strict latency budget, how would you design the serving path (caching, pre-computation, approximate search, etc.)?
- Discuss concrete strategies to trade off model complexity/accuracy against serving latency and system cost.
- Explain where you would use caching (e.g., user-level, item-level, or result-level caches) and what consistency/TTL strategies you might choose.
6. **Monitoring, Evaluation, and Iteration**
- What **online metrics** and **offline metrics** would you track to evaluate the recommender system?
- How would you set up **A/B testing** or other online experiments?
- Describe what you would monitor in production (e.g., model performance drift, feature distribution shift, latency, error rates) and how you would respond.
7. **Scalability, Reliability, and Other Practical Considerations**
- Discuss storage and computation choices (e.g., streaming system, message queues, scalable storage for logs, feature store, and models).
- How would you design for fault tolerance, graceful degradation, and fallback behavior (e.g., if the model server is down or too slow)?
Clearly explain your assumptions and walk through your design step by step.
Quick Answer: This question evaluates a candidate's competency in designing large-scale, low-latency real-time recommendation systems, covering feature engineering and feature-store consistency, candidate generation and ranking architectures, cold-start handling, latency versus accuracy trade-offs, monitoring, and operational scalability.