Design an enterprise RAG system
Company: OpenAI
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Technical Screen
Design a retrieval-augmented generation (RAG) system for enterprise users. Requirements: multi-tenant isolation and authorization; ingestion of heterogeneous documents (PDF, HTML, emails, spreadsheets) at up to 10M docs/day; near-real-time freshness (<5 minutes from arrival to searchable); P50 latency ≤800 ms and P95 ≤2 s per query; strong PII handling (encryption at rest/in transit, redaction); budget constraints per 1k queries. Describe the end-to-end architecture: ingestion, parsing/chunking, metadata extraction, embeddings pipeline, vector index selection and sharding, hybrid (sparse+dense) retrieval, re-ranking, prompt orchestration and context window management, generator selection, and response post-processing. Address evaluation and offline/online metrics, feedback loops and active learning, hallucination mitigation (citation grounding, filters), guardrails/safety, caching, observability (tracing, drift, recall@k dashboards), capacity planning and autoscaling, disaster recovery, and deployment options (cloud vs on-prem). Justify trade-offs among accuracy, latency, and cost, and outline a plan to run A/B experiments before rollout.
Quick Answer: This question evaluates a candidate's competency in ML system design, specifically retrieval-augmented generation, large-scale ingestion and indexing, vector search and hybrid retrieval, multi-tenant isolation, key management, and compliance-aware data handling.