PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/OpenAI

Design a production RAG system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in designing end-to-end Retrieval-Augmented Generation (RAG) systems for enterprise document QA, including architecture choices for ingestion and chunking, embedding strategies, vector indexing and ANN, hybrid retrieval and re-ranking, prompt orchestration, safety/PII controls, multilingual support, scalability, observability, API design, and rollout planning. Commonly asked in ML System Design and information retrieval/NLP interviews to assess the ability to reason about trade-offs between recall, latency, cost, and compliance, it tests both high-level architectural judgment (conceptual understanding) and implementation-level production considerations (practical application).

  • hard
  • OpenAI
  • ML System Design
  • Machine Learning Engineer

Design a production RAG system

Company: OpenAI

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design a production retrieval-augmented generation (RAG) system for enterprise document QA. Specify the end-to-end architecture and key choices: ingestion and chunking strategy (window/stride, metadata, tables/code handling), embedding model selection and dimensionality, ANN index type and parameters (e.g., HNSW/IVF, recall/latency trade-offs), retrieval pipeline (BM25 hybrid, filters, time decay, multi-vector, cross-encoder re-ranking), prompt orchestration (grounding, citations, tool calls), context packing and deduplication, hallucination mitigation (attribution checks, answerability thresholds, refusal policy), caching layers (query/result/vector), freshness and incremental updates, multi-lingual handling, and safety/PII redaction. Detail scalability (sharding, replication, vector store choice), latency budgets and SLAs, observability (retrieval/answer quality metrics, drift monitoring), offline/online evaluation (gold sets, synthetic data, AB tests), human feedback loops, cost controls, and fallback strategies when retrieval is weak. Provide an API design, data schema for documents/embeddings, and a rollout plan.

Quick Answer: This question evaluates a candidate's competency in designing end-to-end Retrieval-Augmented Generation (RAG) systems for enterprise document QA, including architecture choices for ingestion and chunking, embedding strategies, vector indexing and ANN, hybrid retrieval and re-ranking, prompt orchestration, safety/PII controls, multilingual support, scalability, observability, API design, and rollout planning. Commonly asked in ML System Design and information retrieval/NLP interviews to assess the ability to reason about trade-offs between recall, latency, cost, and compliance, it tests both high-level architectural judgment (conceptual understanding) and implementation-level production considerations (practical application).

Related Interview Questions

  • Design a Text-to-Video Generation Service - OpenAI (medium)
  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
OpenAI logo
OpenAI
Aug 11, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
14
0

Design a Production RAG System for Enterprise Document QA

Context

You are designing a Retrieval-Augmented Generation (RAG) system to answer questions over large, evolving enterprise document corpora (policies, specs, tickets, wikis, PDFs, spreadsheets, code snippets). The system must support access controls, multilingual content, and strong safety/PII guarantees.

Requirements

Specify the end-to-end architecture and key design choices for:

  1. Ingestion and Chunking
  • Connectors (file stores, wikis, ticketing systems), parsing (PDF/Office/HTML), and normalization
  • Chunking strategy: window size, stride, hierarchical metadata, handling of tables and code
  1. Embeddings
  • Model selection (monolingual vs multilingual), dimensionality, normalization, multi-vector strategy (title/body/table/code)
  1. Indexing and ANN
  • Vector store choice; ANN algorithm (HNSW/IVF/IVF-PQ) and parameters
  • Recall/latency/cost trade-offs; sharding and replication
  1. Retrieval Pipeline
  • Hybrid retrieval (BM25 + dense), filters (ACLs, metadata), time decay, multi-vector fusion
  • Re-ranking (cross-encoder), multi-stage retrieval, answerability scoring
  1. Prompt Orchestration
  • Grounding and citations, tool/function calls, context packing and deduplication
  1. Hallucination Mitigation
  • Attribution checks, coverage thresholds, refusal policy
  1. Caching and Freshness
  • Query/result/vector caches; invalidation; incremental updates and rebuilds
  1. Multilingual and Safety
  • Language detection and cross-lingual retrieval; PII redaction and policy enforcement
  1. Scalability, Latency, and SLAs
  • Capacity planning, concurrency, tail-latency budgets, vector store scaling
  1. Observability and Evaluation
  • Metrics (retrieval/answer quality), drift monitoring, offline gold sets, synthetic data, online A/B tests
  1. Human Feedback and Cost Controls
  • Feedback loops, active learning, budget-aware retrieval/generation
  1. Fallback Strategies
  • When retrieval is weak: clarification, escalation, graceful refusal
  1. API Design and Data Schema
  • REST/JSON APIs; schemas for documents, chunks, embeddings, and citations
  1. Rollout Plan
  • Staging, backfill, canary, monitoring, and incident playbooks

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.