Design a Production RAG System for Enterprise Document QA
Context
You are designing a Retrieval-Augmented Generation (RAG) system to answer questions over large, evolving enterprise document corpora (policies, specs, tickets, wikis, PDFs, spreadsheets, code snippets). The system must support access controls, multilingual content, and strong safety/PII guarantees.
Requirements
Specify the end-to-end architecture and key design choices for:
-
Ingestion and Chunking
-
Connectors (file stores, wikis, ticketing systems), parsing (PDF/Office/HTML), and normalization
-
Chunking strategy: window size, stride, hierarchical metadata, handling of tables and code
-
Embeddings
-
Model selection (monolingual vs multilingual), dimensionality, normalization, multi-vector strategy (title/body/table/code)
-
Indexing and ANN
-
Vector store choice; ANN algorithm (HNSW/IVF/IVF-PQ) and parameters
-
Recall/latency/cost trade-offs; sharding and replication
-
Retrieval Pipeline
-
Hybrid retrieval (BM25 + dense), filters (ACLs, metadata), time decay, multi-vector fusion
-
Re-ranking (cross-encoder), multi-stage retrieval, answerability scoring
-
Prompt Orchestration
-
Grounding and citations, tool/function calls, context packing and deduplication
-
Hallucination Mitigation
-
Attribution checks, coverage thresholds, refusal policy
-
Caching and Freshness
-
Query/result/vector caches; invalidation; incremental updates and rebuilds
-
Multilingual and Safety
-
Language detection and cross-lingual retrieval; PII redaction and policy enforcement
-
Scalability, Latency, and SLAs
-
Capacity planning, concurrency, tail-latency budgets, vector store scaling
-
Observability and Evaluation
-
Metrics (retrieval/answer quality), drift monitoring, offline gold sets, synthetic data, online A/B tests
-
Human Feedback and Cost Controls
-
Feedback loops, active learning, budget-aware retrieval/generation
-
Fallback Strategies
-
When retrieval is weak: clarification, escalation, graceful refusal
-
API Design and Data Schema
-
REST/JSON APIs; schemas for documents, chunks, embeddings, and citations
-
Rollout Plan
-
Staging, backfill, canary, monitoring, and incident playbooks