PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/ML System Design/OpenAI

Design an enterprise RAG system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in ML system design, specifically retrieval-augmented generation, large-scale ingestion and indexing, vector search and hybrid retrieval, multi-tenant isolation, key management, and compliance-aware data handling.

  • hard
  • OpenAI
  • ML System Design
  • Machine Learning Engineer

Design an enterprise RAG system

Company: OpenAI

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design a retrieval-augmented generation (RAG) system for enterprise users. Requirements: multi-tenant isolation and authorization; ingestion of heterogeneous documents (PDF, HTML, emails, spreadsheets) at up to 10M docs/day; near-real-time freshness (<5 minutes from arrival to searchable); P50 latency ≤800 ms and P95 ≤2 s per query; strong PII handling (encryption at rest/in transit, redaction); budget constraints per 1k queries. Describe the end-to-end architecture: ingestion, parsing/chunking, metadata extraction, embeddings pipeline, vector index selection and sharding, hybrid (sparse+dense) retrieval, re-ranking, prompt orchestration and context window management, generator selection, and response post-processing. Address evaluation and offline/online metrics, feedback loops and active learning, hallucination mitigation (citation grounding, filters), guardrails/safety, caching, observability (tracing, drift, recall@k dashboards), capacity planning and autoscaling, disaster recovery, and deployment options (cloud vs on-prem). Justify trade-offs among accuracy, latency, and cost, and outline a plan to run A/B experiments before rollout.

Quick Answer: This question evaluates a candidate's competency in ML system design, specifically retrieval-augmented generation, large-scale ingestion and indexing, vector search and hybrid retrieval, multi-tenant isolation, key management, and compliance-aware data handling.

Related Interview Questions

  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
  • How would you build an image classifier with dirty data? - OpenAI (easy)
OpenAI logo
OpenAI
Jul 31, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
7
0

System Design: Retrieval-Augmented Generation (RAG) for Enterprise

Context

Design a production-grade, multi-tenant RAG platform for enterprise users. The system must ingest and index heterogeneous internal documents and serve secure, low-latency, cost-efficient, and accurate answers backed by citations.

Assume the following:

  • Scale: Up to 10 million documents per day across all tenants.
  • Query load: Moderate to high (varies by tenant); design to autoscale.
  • Content types: PDFs, HTML/web pages, emails, spreadsheets, and plain text.
  • Tenancy: Strong isolation with per-tenant authorization and key management.
  • Compliance: PII/PHI presence likely; data residency may be required for some tenants.

Requirements

  • Multi-tenant isolation and authorization.
  • Ingestion throughput: Up to 10M docs/day.
  • Freshness: < 5 minutes from document arrival to searchable.
  • Latency SLOs: P50 ≤ 800 ms, P95 ≤ 2 s per query.
  • PII handling: Encryption in transit/at rest, detection/redaction/de-identification.
  • Budget: Bounded cost per 1k queries (optimize and justify).

Deliverables

Describe the end-to-end architecture and justify trade-offs among accuracy, latency, and cost.

Include:

  1. Ingestion, parsing/chunking, metadata extraction, embeddings pipeline.
  2. Vector index selection, sharding strategy, and hybrid retrieval (sparse + dense).
  3. Re-ranking, prompt orchestration, context window management, generator selection.
  4. Response post-processing (citations, formatting, redaction), hallucination mitigation.
  5. Evaluation: Offline/online metrics, feedback loops, active learning.
  6. Guardrails/safety, caching, observability (tracing, drift, recall@k dashboards).
  7. Capacity planning and autoscaling; disaster recovery; deployment options (cloud vs on‑prem).
  8. A plan to run A/B experiments before rollout.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.