Design an enterprise RAG system

Q: Design an enterprise RAG system

This is a ML System Design interview question from OpenAI for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design Task: Retrieval-Augmented Generation (RAG) for Enterprise Users

You are designing a multi-tenant enterprise RAG system that answers user questions from private, internal documents across many sources (e.g., docs, wikis, tickets, storage buckets). Provide a design that balances relevance, faithfulness, latency, cost, and strong access controls.

Requirements

Define scope, assumptions, and SLAs/SLOs.
Data ingestion:
- Connectors (e.g., Google Drive, SharePoint, Confluence, Slack, S3/Blob, Jira, databases via CDC).
- Incremental/delta updates, versioning, and deletions (tombstones).
Document processing:
- Chunking strategy and overlap, section-awareness, and metadata schema.
- Deduplication and canonicalization.
Embeddings:
- Model selection criteria (domain, multilingual, speed vs quality) and fine-tuning.
Vector store:
- Index layout (ANN type, sharding, namespaces) and metadata filters.
- Access-control indexing strategy.
Retrieval:
- Hybrid dense + lexical (top-k), rank fusion, and optional cross-encoder reranking.
- Diversity and recency signals.
Prompting and generation:
- Prompt assembly with citations and structured outputs.
- LLM selection, routing, and fallback.
Guardrails to reduce hallucinations:
- Similarity/coverage thresholds, abstain policies, grounding, and citation verification.
- Prompt-injection defenses.
Security and multi-tenancy:
- Tenant isolation and document-level ACL enforcement.
Evaluation and operations:

Metrics (faithfulness, answer relevance, latency, cost), offline test sets and golden questions, online feedback loops.
Observability, red-teaming, and cost/performance tuning (caching, batching, distillation).

Deliver an architecture and operational plan that connects these components end-to-end.

Design an enterprise RAG system

System Design Task: Retrieval-Augmented Generation (RAG) for Enterprise Users

Requirements

Solution

Comments (0)