PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Google

Design a real-time recommendation system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in designing large-scale, low-latency real-time recommendation systems, covering feature engineering and feature-store consistency, candidate generation and ranking architectures, cold-start handling, latency versus accuracy trade-offs, monitoring, and operational scalability.

  • hard
  • Google
  • ML System Design
  • Machine Learning Engineer

Design a real-time recommendation system

Company: Google

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

You are asked to design a **real-time recommendation system** for a large-scale consumer product (for example, recommending items or content to users in a mobile app). The system should: - Serve personalized recommendations with **low latency** (end-to-end P95 latency target: e.g., < 100 ms from request to response at the service layer). - Support **millions of daily active users** and **tens of thousands of candidate items** that change over time. - Continuously incorporate new user interactions (clicks, views, purchases, etc.) to keep recommendations fresh. Address the following aspects in your design: 1. **High-level Architecture & Data Flow** - Describe the overall pipeline from data generation to model training to online serving. - Explicitly separate **offline**, **nearline**, and **online** components where applicable. 2. **Feature Engineering & Feature Store** - What kinds of user, item, and context features would you use? - How would you design a **feature store** to support both offline training and online inference with consistent features? 3. **Modeling Approach** - Propose a baseline model (e.g., simple heuristics or a shallow model) and then a more advanced model (e.g., deep learning–based ranking). - Explain how you would structure the system as **candidate generation + ranking** (or another decomposition) and why. 4. **Cold Start Problem** - How would you handle **new users** with little or no history? - How would you handle **new items** with no interaction data? - Discuss multiple strategies (e.g., content-based features, popularity-based recommendations, exploration). 5. **Latency vs. Accuracy Trade-offs** - Given a strict latency budget, how would you design the serving path (caching, pre-computation, approximate search, etc.)? - Discuss concrete strategies to trade off model complexity/accuracy against serving latency and system cost. - Explain where you would use caching (e.g., user-level, item-level, or result-level caches) and what consistency/TTL strategies you might choose. 6. **Monitoring, Evaluation, and Iteration** - What **online metrics** and **offline metrics** would you track to evaluate the recommender system? - How would you set up **A/B testing** or other online experiments? - Describe what you would monitor in production (e.g., model performance drift, feature distribution shift, latency, error rates) and how you would respond. 7. **Scalability, Reliability, and Other Practical Considerations** - Discuss storage and computation choices (e.g., streaming system, message queues, scalable storage for logs, feature store, and models). - How would you design for fault tolerance, graceful degradation, and fallback behavior (e.g., if the model server is down or too slow)? Clearly explain your assumptions and walk through your design step by step.

Quick Answer: This question evaluates a candidate's competency in designing large-scale, low-latency real-time recommendation systems, covering feature engineering and feature-store consistency, candidate generation and ranking architectures, cold-start handling, latency versus accuracy trade-offs, monitoring, and operational scalability.

Related Interview Questions

  • Design an app-store app recommendation system - Google (medium)
  • Design a chatbot over structured and unstructured data - Google (medium)
  • Design a fraud detection system - Google (medium)
  • Choose Fast or Cheap Models - Google
  • Design ML system for self-driving perception - Google (medium)
Google logo
Google
Dec 8, 2025, 6:09 PM
Machine Learning Engineer
Onsite
ML System Design
9
0

You are asked to design a real-time recommendation system for a large-scale consumer product (for example, recommending items or content to users in a mobile app).

The system should:

  • Serve personalized recommendations with low latency (end-to-end P95 latency target: e.g., < 100 ms from request to response at the service layer).
  • Support millions of daily active users and tens of thousands of candidate items that change over time.
  • Continuously incorporate new user interactions (clicks, views, purchases, etc.) to keep recommendations fresh.

Address the following aspects in your design:

  1. High-level Architecture & Data Flow
    • Describe the overall pipeline from data generation to model training to online serving.
    • Explicitly separate offline , nearline , and online components where applicable.
  2. Feature Engineering & Feature Store
    • What kinds of user, item, and context features would you use?
    • How would you design a feature store to support both offline training and online inference with consistent features?
  3. Modeling Approach
    • Propose a baseline model (e.g., simple heuristics or a shallow model) and then a more advanced model (e.g., deep learning–based ranking).
    • Explain how you would structure the system as candidate generation + ranking (or another decomposition) and why.
  4. Cold Start Problem
    • How would you handle new users with little or no history?
    • How would you handle new items with no interaction data?
    • Discuss multiple strategies (e.g., content-based features, popularity-based recommendations, exploration).
  5. Latency vs. Accuracy Trade-offs
    • Given a strict latency budget, how would you design the serving path (caching, pre-computation, approximate search, etc.)?
    • Discuss concrete strategies to trade off model complexity/accuracy against serving latency and system cost.
    • Explain where you would use caching (e.g., user-level, item-level, or result-level caches) and what consistency/TTL strategies you might choose.
  6. Monitoring, Evaluation, and Iteration
    • What online metrics and offline metrics would you track to evaluate the recommender system?
    • How would you set up A/B testing or other online experiments?
    • Describe what you would monitor in production (e.g., model performance drift, feature distribution shift, latency, error rates) and how you would respond.
  7. Scalability, Reliability, and Other Practical Considerations
    • Discuss storage and computation choices (e.g., streaming system, message queues, scalable storage for logs, feature store, and models).
    • How would you design for fault tolerance, graceful degradation, and fallback behavior (e.g., if the model server is down or too slow)?

Clearly explain your assumptions and walk through your design step by step.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Google•More Machine Learning Engineer•Google Machine Learning Engineer•Google ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.