Design an ML-powered enterprise search system using Retrieval-Augmented Generation (RAG) under the following context and constraints.
Assume textual content (no heavy images), standard enterprise auth (OIDC/SAML), and typical query lengths (short questions/keywords). If not stated, make minimal, reasonable assumptions to complete the design.
(a) Ingestion and chunking: Describe parsing, deduplication, metadata extraction, embedding generation, chunk-size strategy, versioning, and incremental updates.
(b) Indexing and retrieval: Propose a hybrid sparse+vector approach (BM25 + ANN), metadata filters, tenant isolation, query understanding/reformulation, top-k selection, and cross-encoder reranking.
(c) Generation: Outline prompt design, grounding with citations, constrained decoding, tool usage, streaming responses, and multilingual handling.
(d) Guardrails and safety: Methods for hallucination reduction, citation enforcement, out-of-policy refusal, PII/security controls, and ACL-aware retrieval.
(e) Evaluation and monitoring: Offline metrics (e.g., NDCG@k, recall@k, answer faithfulness), online A/B tests, user feedback loops, and drift/latency/cost monitoring.
(f) Architecture and scaling: Service decomposition, model hosting/batching, caching, vector store selection, backpressure, failover, and disaster recovery.
(g) Cost and latency calculations: Derive per-stage latency/cost, capacity plan for embeddings, ANN index size, and compute requirements. Justify model choices under the constraints.
Login required