Design a Retrieval-Augmented Generation (RAG) system
Company: OpenAI
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Technical Screen
## Prompt
Design a **Retrieval-Augmented Generation (RAG)** system that answers user questions using an organization’s internal documents (PDFs, wiki pages, tickets, and policies) while minimizing hallucinations.
## Requirements
- **Inputs**: user natural-language query; a continuously updated document corpus.
- **Outputs**: a grounded answer with **citations** (snippets + document links/IDs).
- **Quality goals**:
- High answer correctness and groundedness.
- Handle ambiguous questions by asking clarifying questions when needed.
- **System goals**:
- Low latency (interactive).
- Scalable to millions of documents.
- Support frequent document updates (new/edited/deleted docs).
- Security: enforce **document-level access control** (per user/role) and prevent data leakage.
- Observability: logging, monitoring, evaluation, and iterative improvement.
## What to cover
Explain the end-to-end architecture including:
- Ingestion + preprocessing (chunking, metadata, dedup).
- Embedding generation and indexing.
- Retrieval (vector + keyword), reranking, and context construction.
- LLM prompting and citation generation.
- Caching, rate limiting, and fallbacks.
- Offline/online evaluation and A/B testing.
- Failure modes and mitigations (hallucinations, stale data, prompt injection).
Quick Answer: This question evaluates a candidate's ability to design production-grade Retrieval-Augmented Generation systems, testing competencies in information retrieval, embedding and indexing strategies, LLM integration, scalability, access control, and observability within the ML system design domain.