How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a hard difficulty ML System Design question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Amazon during technical interviews.

Design an email spam detection system | Amazon Interview Question

Quick Overview

This interview question evaluates ML product requirements, data/labeling, modeling, serving architecture, evaluation, monitoring, and trade-offs in a realistic interview setting. A strong answer for Design an email spam detection system states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

Design an email spam detection system

System Design: End-to-End Email Spam Detection

Context

Design an end-to-end system that detects and handles spam emails at scale. Assume you are building for a large consumer email service handling high throughput and strict latency requirements. The design should cover data, ML, serving, experimentation, and operations.

Requirements

Problem Definition and Labeling
- Define the objective(s) and action outcomes (e.g., block, quarantine, inbox with banner).
- Labeling sources and policies.
Data Sources and Collection
- Inbound traffic, user reports, honeypots, abuse teams, reputation feeds.
- Collection, sampling, retention, and governance.
Feature Engineering
- Content features (text, URLs, attachments), headers, sender/domain/IP reputation, network/behavioral signals.
Model Choices and Training
- Baseline rules, supervised ML models, online learning.
- Handling class imbalance, feature hashing, model calibration.
Serving Architecture and Constraints
- Placement in the mail pipeline, APIs, latency/throughput targets, caching, fallbacks.
Thresholding and Calibration
- Score-to-action mapping, per-segment thresholds, calibration methods.
Evaluation Metrics
- Precision, recall, ROC/PR analysis, and cost-weighted metrics.
Abuse/Adversarial Defenses and Feedback Loops
- Evasion tactics, spoofing defenses, URL/attachment handling, user feedback integration.
Cold Start, Concept Drift, Retraining Cadence
- New senders/domains, seasonal drift, automated retraining.
Online Experimentation
- A/B testing, ramp strategies, guardrails.
Monitoring, Logging, Rollback
- Real-time and batch monitoring, alerting, safe rollback.
Privacy and Compliance
- Data minimization, encryption, regional residency, user controls.

Constraints & Assumptions

Preserve the scope, facts, inputs, and requested outputs from the prompt above.
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.
State explicit assumptions before making sizing or architecture decisions.
Prioritize the functional path first, then address reliability, security, observability, and rollout.

What a Strong Answer Covers

A scoped requirements summary with concrete non-goals and success metrics.
ML-specific data, model, evaluation, serving, and monitoring choices.
Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.
A validation, monitoring, migration, and launch plan appropriate for the risk level.

Follow-up Questions

What breaks first at 10x traffic or data volume?
How would you degrade gracefully during dependency failures?
What metrics and alerts would prove the design is healthy after launch?

Quick Overview

Requirements

Problem Definition and Labeling

Define the objective(s) and action outcomes (e.g., block, quarantine, inbox with banner).
Labeling sources and policies.

Data Sources and Collection

Inbound traffic, user reports, honeypots, abuse teams, reputation feeds.
Collection, sampling, retention, and governance.

Feature Engineering

Content features (text, URLs, attachments), headers, sender/domain/IP reputation, network/behavioral signals.

Model Choices and Training

Baseline rules, supervised ML models, online learning.
Handling class imbalance, feature hashing, model calibration.

Serving Architecture and Constraints

Placement in the mail pipeline, APIs, latency/throughput targets, caching, fallbacks.

Thresholding and Calibration

Score-to-action mapping, per-segment thresholds, calibration methods.

Evaluation Metrics

Precision, recall, ROC/PR analysis, and cost-weighted metrics.

Abuse/Adversarial Defenses and Feedback Loops

Evasion tactics, spoofing defenses, URL/attachment handling, user feedback integration.

Cold Start, Concept Drift, Retraining Cadence

New senders/domains, seasonal drift, automated retraining.

Online Experimentation

A/B testing, ramp strategies, guardrails.

Monitoring, Logging, Rollback

Real-time and batch monitoring, alerting, safe rollback.

Privacy and Compliance

Data minimization, encryption, regional residency, user controls.

Constraints & Assumptions

Preserve the scope, facts, inputs, and requested outputs from the prompt above.

If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.

Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.

State explicit assumptions before making sizing or architecture decisions.

Prioritize the functional path first, then address reliability, security, observability, and rollout.

What a Strong Answer Covers

A scoped requirements summary with concrete non-goals and success metrics.

ML-specific data, model, evaluation, serving, and monitoring choices.

Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.

A validation, monitoring, migration, and launch plan appropriate for the risk level.

Follow-up Questions

What breaks first at 10x traffic or data volume?

How would you degrade gracefully during dependency failures?

What metrics and alerts would prove the design is healthy after launch?

Design an email spam detection system

Quick Overview

Design an email spam detection system

Design an email spam detection system

System Design: End-to-End Email Spam Detection

Context

Requirements

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP

Design an email spam detection system

Quick Overview

Design an email spam detection system

Design an email spam detection system

System Design: End-to-End Email Spam Detection

Context

Requirements

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP