System Design: End-to-End ML Recommendation System
Scenario
You are building an end-to-end machine-learning-powered recommendation system for a large consumer application (e.g., e-commerce). The system must recommend items on high-traffic surfaces (home feed, product detail pages) with strict real-time latency constraints.
Task
Design the system from data collection to real-time serving. Clearly describe:
-
Data collection and governance
-
What user/item/context signals to log, how to structure event schemas, identity management, and how to prevent data leakage.
-
Feature pipelines
-
Batch and streaming feature engineering, a feature store strategy, point-in-time correctness, and training–serving consistency.
-
Training workflow
-
Labeling strategy, negative sampling, model architectures (e.g., retrieval + ranking), objective functions, experiment tracking, and offline evaluation.
-
Model refresh cadence
-
How frequently to update embeddings and ranking models; handling cold-start for new users/items.
-
Online/offline architecture
-
Candidate generation, ranking, re-ranking, caching, vector search/ANN, and how offline components (data lake, orchestration) integrate with online serving.
-
Real-time latency requirements
-
An end-to-end p95/p99 latency budget and techniques to meet it.
Additionally address
-
Feedback loops and exploration vs. exploitation.
-
A/B testing and experiment guardrails.
-
Fallback logic and graceful degradation.