PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/ML System Design/OpenAI

Design a low-latency RAG system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates production ML system design competencies, specifically retrieval-augmented generation (RAG) architecture, latency and cost optimization, indexing and retrieval strategies, caching, re-ranking, safety and operational reliability, and evaluation metrics.

  • hard
  • OpenAI
  • ML System Design
  • Machine Learning Engineer

Design a low-latency RAG system

Company: OpenAI

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design a production-grade retrieval-augmented generation (RAG) system for a customer-support assistant with strict latency (target p99 ≤ 1.5 s) and cost constraints. Specify the end-to-end architecture: document ingestion, chunking strategy, embedding model choice, index type (e.g., vector, BM25, or hybrid), caching layers, re-ranking, prompt orchestration, and safety/guardrail components. Describe how you will handle document updates and deletions, multi-tenant data isolation, and failure modes (e.g., empty retrieval, timeouts, stale cache). Propose offline and online evaluation: define key metrics (answer accuracy, hallucination rate, latency, cost), design an experiment plan (e.g., BM25 + cross-encoder re-ranker vs dense-only), and outline an A/B test. Provide a scaling plan and back-of-the-envelope capacity estimates for 10 million documents and 50 QPS, including index sizing, throughput, and cost. Finally, explain what you would prioritize and de-scope first if you had only 15 minutes to present a coherent plan under interview time pressure.

Quick Answer: This question evaluates production ML system design competencies, specifically retrieval-augmented generation (RAG) architecture, latency and cost optimization, indexing and retrieval strategies, caching, re-ranking, safety and operational reliability, and evaluation metrics.

Related Interview Questions

  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
  • How would you build an image classifier with dirty data? - OpenAI (easy)
OpenAI logo
OpenAI
Jul 27, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
11
0

System Design: Production-Grade RAG for Customer Support (p99 ≤ 1.5 s)

Goal

Design a production-ready retrieval-augmented generation (RAG) system for a customer-support assistant with strict latency (target p99 ≤ 1.5 s) and cost constraints.

Requirements

  1. End-to-end architecture:
    • Document ingestion
    • Chunking strategy
    • Embedding model choice
    • Index type (vector, BM25, or hybrid)
    • Caching layers
    • Re-ranking
    • Prompt orchestration
    • Safety/guardrail components
  2. Data management and reliability:
    • Handling document updates and deletions
    • Multi-tenant data isolation
    • Failure modes (empty retrieval, timeouts, stale cache) and fallbacks
  3. Evaluation:
    • Offline and online metrics (answer accuracy, hallucination rate, latency, cost)
    • Experiment plan (e.g., BM25 + cross-encoder re-ranker vs dense-only)
    • A/B test outline
  4. Scaling plan and capacity estimates for 10 million documents and 50 QPS:
    • Index sizing, throughput, and cost
  5. Prioritization under time pressure:
    • What to cover vs. de-scope first in a 15-minute interview presentation

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.