Design and evaluate a RAG system
Company: Amazon
Role: Data Scientist
Category: Machine Learning
Difficulty: easy
Interview Round: Technical Screen
You are interviewing for an L5 Data Scientist role focused on LLM applications. Design a **retrieval-augmented generation (RAG)** system for an internal question-answering product over enterprise documents.
Your answer should cover:
- the end-to-end architecture, including document ingestion, chunking, embeddings, retrieval, reranking, prompt construction, generation, and citation or grounding
- how you would choose between dense retrieval, sparse retrieval, or a hybrid approach
- key tradeoffs such as latency, cost, freshness, precision vs. recall, context window limits, and hallucination risk
- how you would handle null or missing metadata, stale documents, duplicate content, and permission-sensitive documents
- how you would evaluate the system offline and online, including model-quality metrics, business metrics, and guardrail metrics
- when you would prefer RAG over fine-tuning, and what failure modes you would expect in production
Assume the system must support frequent document updates, provide trustworthy answers, and operate under realistic serving constraints.
Quick Answer: This question evaluates a candidate's competency in designing and evaluating retrieval-augmented generation (RAG) systems, including document ingestion, chunking, embedding and retrieval strategies, reranking, prompt construction, grounding/citation, operational constraints like latency and freshness, permission handling, and evaluation metrics and failure modes. It is commonly asked to assess practical system-design and applied machine learning skills for LLM applications, testing knowledge in the Machine Learning/Information Retrieval domain and requiring both practical application-level reasoning about trade-offs (latency, cost, precision vs. recall) and conceptual understanding of evaluation and guardrails.