PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/OpenAI

Design an End-to-End ML System

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in designing end-to-end machine learning systems for real-time recommendation services within the ML System Design domain, covering data collection and event pipelines, feature engineering and stores, model training and retraining, online serving architecture, monitoring, and operational constraints.

  • hard
  • OpenAI
  • ML System Design
  • Software Engineer

Design an End-to-End ML System

Company: OpenAI

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

##### Question Design an end-to-end machine-learning system for a real-time recommendation product. Explain data collection, feature engineering, model training, online serving, monitoring, and scalability considerations.

Quick Answer: This question evaluates a candidate's competency in designing end-to-end machine learning systems for real-time recommendation services within the ML System Design domain, covering data collection and event pipelines, feature engineering and stores, model training and retraining, online serving architecture, monitoring, and operational constraints.

Related Interview Questions

  • Design a Text-to-Video Generation Service - OpenAI (medium)
  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
OpenAI logo
OpenAI
Aug 4, 2025, 10:55 AM
Software Engineer
Technical Screen
ML System Design
6
0

System Design: Real-Time Recommendation ML System

Context

You are tasked with designing an end-to-end machine-learning system that serves real-time recommendations in a consumer-facing product (e.g., feed, products, videos). The system must handle high read traffic and evolving content and user behavior.

Assumptions (you may refine during the interview):

  • Traffic: ~10k QPS; p95 latency target ≤ 150 ms for recommendation API
  • Inventory: 10M items; daily new/expiring items
  • Feedback: clicks, likes, purchases; implicit and explicit signals
  • Privacy: user consent, PII minimization, right-to-erasure compliance

Requirements

Explain and justify the design for each of the following:

  1. Data collection and event pipeline
  2. Feature engineering and feature store (offline and online)
  3. Model training, labeling, and retraining strategy
  4. Online serving architecture (candidate generation, ranking, re-ranking)
  5. Monitoring, alerting, and experimentation
  6. Scalability, reliability, and cost considerations

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Software Engineer•OpenAI Software Engineer•OpenAI ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.