Design a RAG question-answering system
Company: Harvey
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
## Scenario
Design a **Retrieval-Augmented Generation (RAG)** system that answers user questions using an internal document corpus (e.g., product docs, policies, runbooks). The system should ground answers in the corpus and cite sources.
## Requirements
### Functional
- Users submit a natural-language query and receive an answer generated by an LLM.
- The answer must be grounded in retrieved documents (include citations/links/IDs).
- Support document ingestion/updates (new docs, edits, deletions).
- Handle multi-turn conversations (optional, but describe how you would support it).
### Non-functional
- Latency target (p95): e.g., **< 3 seconds** for typical queries.
- Availability: e.g., **99.9%**.
- Data privacy: some documents may be access-controlled per user/team.
- Quality: minimize hallucinations; provide a way to evaluate and monitor quality.
## What to cover
- High-level architecture and main components.
- Data ingestion and indexing pipeline (chunking, embeddings, metadata).
- Retrieval strategy (top-k, filtering, reranking).
- Prompting/generation strategy (context window management, citations).
- Storage choices (vector DB, metadata store) and scaling approach.
- Caching, monitoring, evaluation, and failure modes.
Quick Answer: This question evaluates system design and machine-learning engineering competencies for building Retrieval-Augmented Generation (RAG) systems, including information retrieval, embedding-based vector search, LLM prompting, data ingestion and indexing, storage and operational concerns within the System Design domain.