Design an End-to-End Enterprise RAG Search System
Background
You are tasked with designing a Retrieval-Augmented Generation (RAG) search system for enterprise users. The system should allow employees to ask natural-language questions and receive grounded, cited answers using their organization’s private documents and tools.
Assume a multi-tenant, cloud-hosted setup with strict security and compliance requirements. Content spans PDFs, Office docs, wikis, tickets, chats, and databases. Scale assumptions (adjust as needed):
-
1,000+ active users; 10–100 queries/sec peak.
-
10–100 million text chunks indexed across tenants; up to 1 million new/updated documents per day.
-
Data freshness target: under 5 minutes from change to searchable.
-
Latency SLO: P50 ≤ 1.5s, P95 ≤ 3s for typical questions; streaming responses acceptable.
Task
Design the system and cover the following:
-
Architecture: High-level components and request/response flow (ingestion, indexing, retrieval, generation, observability).
-
Data ingestion: Connectors, parsing/OCR, normalization, chunking, metadata/ACLs, dedup/versioning, enrichment (embeddings, entities), and freshness.
-
Retriever and generator selection: Dense vs. sparse vs. hybrid retrieval, reranking, LLM choice, grounding, citations.
-
Indexing: Vector/sparse index choices, schema, sharding/partitioning, filters, and update strategies.
-
Latency: End-to-end budgets by stage, caching, and performance optimizations.
-
Security and privacy: AuthN/Z, multi-tenancy/isolation, encryption, audit, prompt-injection defenses, data handling.
-
Scalability and operations: Horizontal scaling, backfills/re-embeddings, monitoring/eval, cost controls, failure modes, and rollouts.
Include key trade-offs and minimal diagrams-in-words (a clear component-by-component description is sufficient).