PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/ML System Design/Amazon

Design a RAG system end to end

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design a production-grade Retrieval‑Augmented Generation (RAG) system, testing competencies in scalable ML system architecture, embedding and vector retrieval strategies, prompt orchestration, freshness and latency engineering, security/access controls, and evaluation metrics.

  • hard
  • Amazon
  • ML System Design
  • Machine Learning Engineer

Design a RAG system end to end

Company: Amazon

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design a retrieval-augmented generation system for enterprise text. Specify the ingestion pipeline (chunking, embedding generation, indexing), retrieval strategy (vector search, hybrid retrieval, reranking), prompt orchestration, grounding and citations, freshness handling, latency/throughput targets, and privacy controls. Discuss evaluation for relevance and answer quality, approaches to reduce hallucinations, and how you would scale and monitor the system in production.

Quick Answer: This question evaluates a candidate's ability to design a production-grade Retrieval‑Augmented Generation (RAG) system, testing competencies in scalable ML system architecture, embedding and vector retrieval strategies, prompt orchestration, freshness and latency engineering, security/access controls, and evaluation metrics.

Related Interview Questions

  • Design systems for global request detection and labeling - Amazon (hard)
  • Design a computer-use agent end-to-end - Amazon (medium)
  • Debug online worse than offline model performance - Amazon (medium)
  • Approach an ambiguous business problem - Amazon (medium)
  • Explain parallelism and collectives in training - Amazon (medium)
|Home/ML System Design/Amazon

Design a RAG system end to end

Amazon logo
Amazon
Sep 6, 2025, 12:00 AM
hardMachine Learning EngineerTechnical ScreenML System Design
10
0

Design a Retrieval‑Augmented Generation (RAG) System for Enterprise Text

Context

You are building a production RAG system that answers employee questions using internal enterprise text (wikis, PDFs, tickets, emails, docs). Data is sensitive and access-controlled. Assume multi-tenant use, mixed document formats, English-first, with the following baseline constraints:

  • Corpus: 5–10 million pages, tens of millions of chunks.
  • Traffic: 200 QPS peak; target end-to-end p95 latency ≤ 2.0 s with server-streamed tokens.
  • Freshness: new or updated content should be searchable within 15 minutes.

Tasks

Design the system and specify:

  1. Ingestion pipeline: chunking strategy, embedding generation, and indexing.
  2. Retrieval strategy: vector search, hybrid retrieval, and reranking.
  3. Prompt orchestration: how the LLM is instructed and grounded; how citations are produced.
  4. Freshness handling: incremental updates, cache invalidation, time-aware ranking.
  5. Latency and throughput targets with a rough budget.
  6. Privacy and security controls for enterprise data.
  7. Evaluation: measuring relevance and answer quality; datasets and metrics.
  8. Reducing hallucinations: techniques across retrieval and generation.
  9. Scale and monitoring: how you would scale, operate, and observe the system in production.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon ML System Design•Machine Learning Engineer ML System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.