Design enterprise RAG search system

Q: Design enterprise RAG search system

This question evaluates an engineer's ability to design end-to-end Retrieval-Augmented Generation (RAG) search systems for enterprise settings, testing competencies in ML system design, information retrieval (dense/sparse/hybrid), vector and sparse indexing, data ingestion and enrichment, LLM selection and grounding, security and compliance, scalability, and observability. It is commonly asked to assess architectural reasoning and trade-off analysis for production ML services—examining how candidates balance latency, freshness, multi-tenancy isolation, and operational concerns—and it belongs to the ML System Design domain, requiring both high-level conceptual understanding and practical application-level design detail.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Design an End-to-End Enterprise RAG Search System

Background

You are tasked with designing a Retrieval-Augmented Generation (RAG) search system for enterprise users. The system should allow employees to ask natural-language questions and receive grounded, cited answers using their organization’s private documents and tools.

Assume a multi-tenant, cloud-hosted setup with strict security and compliance requirements. Content spans PDFs, Office docs, wikis, tickets, chats, and databases. Scale assumptions (adjust as needed):

1,000+ active users; 10–100 queries/sec peak.
10–100 million text chunks indexed across tenants; up to 1 million new/updated documents per day.
Data freshness target: under 5 minutes from change to searchable.
Latency SLO: P50 ≤ 1.5s, P95 ≤ 3s for typical questions; streaming responses acceptable.

Task

Design the system and cover the following:

Architecture: High-level components and request/response flow (ingestion, indexing, retrieval, generation, observability).
Data ingestion: Connectors, parsing/OCR, normalization, chunking, metadata/ACLs, dedup/versioning, enrichment (embeddings, entities), and freshness.
Retriever and generator selection: Dense vs. sparse vs. hybrid retrieval, reranking, LLM choice, grounding, citations.
Indexing: Vector/sparse index choices, schema, sharding/partitioning, filters, and update strategies.
Latency: End-to-end budgets by stage, caching, and performance optimizations.
Security and privacy: AuthN/Z, multi-tenancy/isolation, encryption, audit, prompt-injection defenses, data handling.
Scalability and operations: Horizontal scaling, backfills/re-embeddings, monitoring/eval, cost controls, failure modes, and rollouts.

Include key trade-offs and minimal diagrams-in-words (a clear component-by-component description is sufficient).

Design enterprise RAG search system

Design an End-to-End Enterprise RAG Search System

Background

Task

Solution

Comments (0)