You are asked to design a real-time recommendation system for a large-scale consumer product (for example, recommending items or content to users in a mobile app).
The system should:
-
Serve personalized recommendations with
low latency
(end-to-end P95 latency target: e.g., < 100 ms from request to response at the service layer).
-
Support
millions of daily active users
and
tens of thousands of candidate items
that change over time.
-
Continuously incorporate new user interactions (clicks, views, purchases, etc.) to keep recommendations fresh.
Address the following aspects in your design:
-
High-level Architecture & Data Flow
-
Describe the overall pipeline from data generation to model training to online serving.
-
Explicitly separate
offline
,
nearline
, and
online
components where applicable.
-
Feature Engineering & Feature Store
-
What kinds of user, item, and context features would you use?
-
How would you design a
feature store
to support both offline training and online inference with consistent features?
-
Modeling Approach
-
Propose a baseline model (e.g., simple heuristics or a shallow model) and then a more advanced model (e.g., deep learning–based ranking).
-
Explain how you would structure the system as
candidate generation + ranking
(or another decomposition) and why.
-
Cold Start Problem
-
How would you handle
new users
with little or no history?
-
How would you handle
new items
with no interaction data?
-
Discuss multiple strategies (e.g., content-based features, popularity-based recommendations, exploration).
-
Latency vs. Accuracy Trade-offs
-
Given a strict latency budget, how would you design the serving path (caching, pre-computation, approximate search, etc.)?
-
Discuss concrete strategies to trade off model complexity/accuracy against serving latency and system cost.
-
Explain where you would use caching (e.g., user-level, item-level, or result-level caches) and what consistency/TTL strategies you might choose.
-
Monitoring, Evaluation, and Iteration
-
What
online metrics
and
offline metrics
would you track to evaluate the recommender system?
-
How would you set up
A/B testing
or other online experiments?
-
Describe what you would monitor in production (e.g., model performance drift, feature distribution shift, latency, error rates) and how you would respond.
-
Scalability, Reliability, and Other Practical Considerations
-
Discuss storage and computation choices (e.g., streaming system, message queues, scalable storage for logs, feature store, and models).
-
How would you design for fault tolerance, graceful degradation, and fallback behavior (e.g., if the model server is down or too slow)?
Clearly explain your assumptions and walk through your design step by step.