PracHub
QuestionsPremiumLearningGuidesInterview PrepCoaches
|Home/Machine Learning/Amazon

Design and evaluate a RAG system

Last updated: May 11, 2026

Quick Overview

This question evaluates a candidate's competency in designing and evaluating retrieval-augmented generation (RAG) systems, including document ingestion, chunking, embedding and retrieval strategies, reranking, prompt construction, grounding/citation, operational constraints like latency and freshness, permission handling, and evaluation metrics and failure modes. It is commonly asked to assess practical system-design and applied machine learning skills for LLM applications, testing knowledge in the Machine Learning/Information Retrieval domain and requiring both practical application-level reasoning about trade-offs (latency, cost, precision vs. recall) and conceptual understanding of evaluation and guardrails.

  • easy
  • Amazon
  • Machine Learning
  • Data Scientist

Design and evaluate a RAG system

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Technical Screen

You are interviewing for an L5 Data Scientist role focused on LLM applications. Design a **retrieval-augmented generation (RAG)** system for an internal question-answering product over enterprise documents. Your answer should cover: - the end-to-end architecture, including document ingestion, chunking, embeddings, retrieval, reranking, prompt construction, generation, and citation or grounding - how you would choose between dense retrieval, sparse retrieval, or a hybrid approach - key tradeoffs such as latency, cost, freshness, precision vs. recall, context window limits, and hallucination risk - how you would handle null or missing metadata, stale documents, duplicate content, and permission-sensitive documents - how you would evaluate the system offline and online, including model-quality metrics, business metrics, and guardrail metrics - when you would prefer RAG over fine-tuning, and what failure modes you would expect in production Assume the system must support frequent document updates, provide trustworthy answers, and operate under realistic serving constraints.

Quick Answer: This question evaluates a candidate's competency in designing and evaluating retrieval-augmented generation (RAG) systems, including document ingestion, chunking, embedding and retrieval strategies, reranking, prompt construction, grounding/citation, operational constraints like latency and freshness, permission handling, and evaluation metrics and failure modes. It is commonly asked to assess practical system-design and applied machine learning skills for LLM applications, testing knowledge in the Machine Learning/Information Retrieval domain and requiring both practical application-level reasoning about trade-offs (latency, cost, precision vs. recall) and conceptual understanding of evaluation and guardrails.

Related Interview Questions

  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Explain NLP/RL concepts used in LLM agents - Amazon (hard)
Amazon logo
Amazon
Jan 12, 2026, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
7
0

You are interviewing for an L5 Data Scientist role focused on LLM applications. Design a retrieval-augmented generation (RAG) system for an internal question-answering product over enterprise documents.

Your answer should cover:

  • the end-to-end architecture, including document ingestion, chunking, embeddings, retrieval, reranking, prompt construction, generation, and citation or grounding
  • how you would choose between dense retrieval, sparse retrieval, or a hybrid approach
  • key tradeoffs such as latency, cost, freshness, precision vs. recall, context window limits, and hallucination risk
  • how you would handle null or missing metadata, stale documents, duplicate content, and permission-sensitive documents
  • how you would evaluate the system offline and online, including model-quality metrics, business metrics, and guardrail metrics
  • when you would prefer RAG over fine-tuning, and what failure modes you would expect in production

Assume the system must support frequent document updates, provide trustworthy answers, and operate under realistic serving constraints.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.