System Design Task: Retrieval-Augmented Generation (RAG) for Enterprise Users
You are designing a multi-tenant enterprise RAG system that answers user questions from private, internal documents across many sources (e.g., docs, wikis, tickets, storage buckets). Provide a design that balances relevance, faithfulness, latency, cost, and strong access controls.
Requirements
-
Define scope, assumptions, and SLAs/SLOs.
-
Data ingestion:
-
Connectors (e.g., Google Drive, SharePoint, Confluence, Slack, S3/Blob, Jira, databases via CDC).
-
Incremental/delta updates, versioning, and deletions (tombstones).
-
Document processing:
-
Chunking strategy and overlap, section-awareness, and metadata schema.
-
Deduplication and canonicalization.
-
Embeddings:
-
Model selection criteria (domain, multilingual, speed vs quality) and fine-tuning.
-
Vector store:
-
Index layout (ANN type, sharding, namespaces) and metadata filters.
-
Access-control indexing strategy.
-
Retrieval:
-
Hybrid dense + lexical (top-k), rank fusion, and optional cross-encoder reranking.
-
Diversity and recency signals.
-
Prompting and generation:
-
Prompt assembly with citations and structured outputs.
-
LLM selection, routing, and fallback.
-
Guardrails to reduce hallucinations:
-
Similarity/coverage thresholds, abstain policies, grounding, and citation verification.
-
Prompt-injection defenses.
-
Security and multi-tenancy:
-
Tenant isolation and document-level ACL enforcement.
-
Evaluation and operations:
-
Metrics (faithfulness, answer relevance, latency, cost), offline test sets and golden questions, online feedback loops.
-
Observability, red-teaming, and cost/performance tuning (caching, batching, distillation).
Deliver an architecture and operational plan that connects these components end-to-end.