PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Amazon

Design a RAG system end to end

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design a production-grade Retrieval‑Augmented Generation (RAG) system, testing competencies in scalable ML system architecture, embedding and vector retrieval strategies, prompt orchestration, freshness and latency engineering, security/access controls, and evaluation metrics.

  • hard
  • Amazon
  • ML System Design
  • Machine Learning Engineer

Design a RAG system end to end

Company: Amazon

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design a retrieval-augmented generation system for enterprise text. Specify the ingestion pipeline (chunking, embedding generation, indexing), retrieval strategy (vector search, hybrid retrieval, reranking), prompt orchestration, grounding and citations, freshness handling, latency/throughput targets, and privacy controls. Discuss evaluation for relevance and answer quality, approaches to reduce hallucinations, and how you would scale and monitor the system in production.

Quick Answer: This question evaluates a candidate's ability to design a production-grade Retrieval‑Augmented Generation (RAG) system, testing competencies in scalable ML system architecture, embedding and vector retrieval strategies, prompt orchestration, freshness and latency engineering, security/access controls, and evaluation metrics.

Related Interview Questions

  • Design systems for global request detection and labeling - Amazon (hard)
  • Design a computer-use agent end-to-end - Amazon (medium)
  • Debug online worse than offline model performance - Amazon (medium)
  • Approach an ambiguous business problem - Amazon (medium)
  • Explain parallelism and collectives in training - Amazon (medium)
Amazon logo
Amazon
Sep 6, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
4
0

Design a Retrieval‑Augmented Generation (RAG) System for Enterprise Text

Context

You are building a production RAG system that answers employee questions using internal enterprise text (wikis, PDFs, tickets, emails, docs). Data is sensitive and access-controlled. Assume multi-tenant use, mixed document formats, English-first, with the following baseline constraints:

  • Corpus: 5–10 million pages, tens of millions of chunks.
  • Traffic: 200 QPS peak; target end-to-end p95 latency ≤ 2.0 s with server-streamed tokens.
  • Freshness: new or updated content should be searchable within 15 minutes.

Tasks

Design the system and specify:

  1. Ingestion pipeline: chunking strategy, embedding generation, and indexing.
  2. Retrieval strategy: vector search, hybrid retrieval, and reranking.
  3. Prompt orchestration: how the LLM is instructed and grounded; how citations are produced.
  4. Freshness handling: incremental updates, cache invalidation, time-aware ranking.
  5. Latency and throughput targets with a rough budget.
  6. Privacy and security controls for enterprise data.
  7. Evaluation: measuring relevance and answer quality; datasets and metrics.
  8. Reducing hallucinations: techniques across retrieval and generation.
  9. Scale and monitoring: how you would scale, operate, and observe the system in production.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.