Deep-dive your GenAI project architecture

Q: Deep-dive your GenAI project architecture

This is a ML System Design interview question from Amazon for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

GenAI System Deep-Dive: End-to-End Design and Scale Strategy

Provide a structured walkthrough of a production-grade GenAI system you built end-to-end. Cover the following areas:

1) Problem Definition

What user problem did you solve and for whom?
What were the success criteria and constraints (e.g., latency SLOs, cost per request, compliance)?

2) Data Sourcing and Governance

Sources and modalities (structured/unstructured, internal/external).
Size and quality (volume, coverage, freshness, label quality, dedup/OCR issues).
Privacy, PII handling, access control, consent, retention, residency.

3) Model Choice and Architecture

Rationale for encoder–decoder, instruction-tuned LLM, RAG, or hybrid.
Orchestration: query routing, tools, vector retrieval, rerankers, and any function calling.

4) Training and Fine-Tuning

Objectives (SFT, preference optimization like DPO/KTO, contrastive for embeddings).
Datasets, augmentation/synthetic data, and curriculum.
Hyperparameters, scaling strategy, and infra.

5) Evaluation

Offline metrics (retrieval quality, faithfulness, toxicity, hallucination rate).
Human evaluation protocol and acceptance criteria.
Online A/B or interleaving.

6) Safety and Guardrails

Toxicity, jailbreak, prompt injection, PII leakage mitigation.
Policy enforcement and red-teaming.

7) Latency, Throughput, and Cost

End-to-end latency budget and p95 targets.
Throughput and concurrency limits.
Cost per request and major cost drivers.

8) Key Failure Modes

Where it breaks (data, retrieval, reasoning, safety) and mitigations.

9) Trade-offs

What you optimized for and what you deferred.

10) 10x Scale Plan

How you would evolve the system for 10x traffic while meeting a 200 ms p95 latency SLO and a 20% cost reduction.