Design an enterprise RAG system

Q: Design an enterprise RAG system

This is a ML System Design interview question from OpenAI for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: Retrieval-Augmented Generation (RAG) for Enterprise

Context

Design a production-grade, multi-tenant RAG platform for enterprise users. The system must ingest and index heterogeneous internal documents and serve secure, low-latency, cost-efficient, and accurate answers backed by citations.

Assume the following:

Scale: Up to 10 million documents per day across all tenants.
Query load: Moderate to high (varies by tenant); design to autoscale.
Content types: PDFs, HTML/web pages, emails, spreadsheets, and plain text.
Tenancy: Strong isolation with per-tenant authorization and key management.
Compliance: PII/PHI presence likely; data residency may be required for some tenants.

Requirements

Multi-tenant isolation and authorization.
Ingestion throughput: Up to 10M docs/day.
Freshness: < 5 minutes from document arrival to searchable.
Latency SLOs: P50 ≤ 800 ms, P95 ≤ 2 s per query.
PII handling: Encryption in transit/at rest, detection/redaction/de-identification.
Budget: Bounded cost per 1k queries (optimize and justify).

Deliverables

Describe the end-to-end architecture and justify trade-offs among accuracy, latency, and cost.

Include:

Ingestion, parsing/chunking, metadata extraction, embeddings pipeline.
Vector index selection, sharding strategy, and hybrid retrieval (sparse + dense).
Re-ranking, prompt orchestration, context window management, generator selection.
Response post-processing (citations, formatting, redaction), hallucination mitigation.
Evaluation: Offline/online metrics, feedback loops, active learning.
Guardrails/safety, caching, observability (tracing, drift, recall@k dashboards).
Capacity planning and autoscaling; disaster recovery; deployment options (cloud vs on‑prem).
A plan to run A/B experiments before rollout.

Design an enterprise RAG system

System Design: Retrieval-Augmented Generation (RAG) for Enterprise

Context

Requirements

Deliverables

Solution (Locked)

Comments (0)