PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/ML System Design/OpenAI

Design enterprise RAG search system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates an engineer's ability to design end-to-end Retrieval-Augmented Generation (RAG) search systems for enterprise settings, testing competencies in ML system design, information retrieval (dense/sparse/hybrid), vector and sparse indexing, data ingestion and enrichment, LLM selection and grounding, security and compliance, scalability, and observability. It is commonly asked to assess architectural reasoning and trade-off analysis for production ML services—examining how candidates balance latency, freshness, multi-tenancy isolation, and operational concerns—and it belongs to the ML System Design domain, requiring both high-level conceptual understanding and practical application-level design detail.

  • hard
  • OpenAI
  • ML System Design
  • Machine Learning Engineer

Design enterprise RAG search system

Company: OpenAI

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

##### Question Design an end-to-end Retrieval-Augmented Generation (RAG) search system for enterprise users, covering architecture, data ingestion, retriever and generator selection, indexing, latency, security, and scalability.

Quick Answer: This question evaluates an engineer's ability to design end-to-end Retrieval-Augmented Generation (RAG) search systems for enterprise settings, testing competencies in ML system design, information retrieval (dense/sparse/hybrid), vector and sparse indexing, data ingestion and enrichment, LLM selection and grounding, security and compliance, scalability, and observability. It is commonly asked to assess architectural reasoning and trade-off analysis for production ML services—examining how candidates balance latency, freshness, multi-tenancy isolation, and operational concerns—and it belongs to the ML System Design domain, requiring both high-level conceptual understanding and practical application-level design detail.

Related Interview Questions

  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
  • How would you build an image classifier with dirty data? - OpenAI (easy)
OpenAI logo
OpenAI
Aug 4, 2025, 10:55 AM
Machine Learning Engineer
Technical Screen
ML System Design
6
0

Design an End-to-End Enterprise RAG Search System

Background

You are tasked with designing a Retrieval-Augmented Generation (RAG) search system for enterprise users. The system should allow employees to ask natural-language questions and receive grounded, cited answers using their organization’s private documents and tools.

Assume a multi-tenant, cloud-hosted setup with strict security and compliance requirements. Content spans PDFs, Office docs, wikis, tickets, chats, and databases. Scale assumptions (adjust as needed):

  • 1,000+ active users; 10–100 queries/sec peak.
  • 10–100 million text chunks indexed across tenants; up to 1 million new/updated documents per day.
  • Data freshness target: under 5 minutes from change to searchable.
  • Latency SLO: P50 ≤ 1.5s, P95 ≤ 3s for typical questions; streaming responses acceptable.

Task

Design the system and cover the following:

  1. Architecture: High-level components and request/response flow (ingestion, indexing, retrieval, generation, observability).
  2. Data ingestion: Connectors, parsing/OCR, normalization, chunking, metadata/ACLs, dedup/versioning, enrichment (embeddings, entities), and freshness.
  3. Retriever and generator selection: Dense vs. sparse vs. hybrid retrieval, reranking, LLM choice, grounding, citations.
  4. Indexing: Vector/sparse index choices, schema, sharding/partitioning, filters, and update strategies.
  5. Latency: End-to-end budgets by stage, caching, and performance optimizations.
  6. Security and privacy: AuthN/Z, multi-tenancy/isolation, encryption, audit, prompt-injection defenses, data handling.
  7. Scalability and operations: Horizontal scaling, backfills/re-embeddings, monitoring/eval, cost controls, failure modes, and rollouts.

Include key trade-offs and minimal diagrams-in-words (a clear component-by-component description is sufficient).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.