PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Amazon

Design an end-to-end spam detection system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's system-design and applied machine learning engineering skills—covering problem framing and labeling, feature representation, model selection and calibration, real-time serving constraints, drift detection, and feedback/safety mechanisms—and is commonly asked to probe trade-offs between latency, precision/recall, and robustness against adversarial evolution in production spam detection. Category: Machine Learning; it tests machine learning systems and production-ML competencies at both conceptual-design and practical-application levels, emphasizing calibration, evaluation (offline and online), operational reliability, and rollback/mitigation planning.

  • hard
  • Amazon
  • Machine Learning
  • Data Scientist

Design an end-to-end spam detection system

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Design an end-to-end email spam detection system. Requirements: real-time scoring with p99 latency <50 ms; minimize false positives (target precision ≥98% on hard blocks) while keeping recall high; adversaries evolve tactics. Describe: 1) Problem framing and labeling (ham vs spam; graymail; handling noisy/weak labels and delayed abuse reports). 2) Features and representations (character/word n-grams, sender/domain/IP reputation, URL features, MIME structure, lightweight embeddings), and how you’d prevent leakage (e.g., future knowledge, reply/forward chains). 3) Model choice and serving (e.g., logistic regression vs gradient boosting vs compact transformer), calibration, and thresholding for different enforcement actions (block, quarantine, tag). 4) Training pipeline, sampling to handle prevalence, and drift detection (population/stability metrics, canaries). 5) Offline metrics (PR-AUC, calibrated precision/recall at business thresholds), and online evaluation (A/B design, guardrails, holdouts). 6) Feedback loops and safety (appeals workflow, human-in-the-loop review, bias/privacy/PII handling). 7) Cost, reliability, and rollback plans. Finally, list the top three failure modes you anticipate and concrete mitigations for each.

Quick Answer: This question evaluates a data scientist's system-design and applied machine learning engineering skills—covering problem framing and labeling, feature representation, model selection and calibration, real-time serving constraints, drift detection, and feedback/safety mechanisms—and is commonly asked to probe trade-offs between latency, precision/recall, and robustness against adversarial evolution in production spam detection. Category: Machine Learning; it tests machine learning systems and production-ML competencies at both conceptual-design and practical-application levels, emphasizing calibration, evaluation (offline and online), operational reliability, and rollback/mitigation planning.

Related Interview Questions

  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Explain NLP/RL concepts used in LLM agents - Amazon (hard)
  • Design and evaluate a RAG system - Amazon (easy)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
8
0

Design an End-to-End Email Spam Detection System

You are asked to design a production-grade email spam detection system that meets the following constraints:

  • Real-time scoring with p99 latency < 50 ms.
  • Minimize false positives (target precision ≥ 98% for hard blocks), while keeping recall high.
  • Adversaries evolve tactics continuously.

Address the following:

  1. Problem Framing and Labeling
    • Define classes: ham vs spam, and graymail (legitimate but unwanted marketing/notifications).
    • Discuss labeling sources and strategy, including handling noisy/weak labels and delayed abuse reports.
  2. Features and Representations
    • Propose key features: character/word n-grams, sender/domain/IP reputation, URL features, MIME structure, lightweight embeddings.
    • Explain how to prevent data leakage (e.g., future knowledge, reply/forward chains, time-based leakage).
  3. Model Choice and Serving
    • Compare models (logistic regression, gradient boosting, compact transformer) given latency and adversarial drift.
    • Describe calibration and thresholding for different enforcement actions (block, quarantine, tag).
  4. Training Pipeline, Sampling, and Drift Detection
    • Outline the end-to-end training pipeline and sampling to handle class imbalance.
    • Describe drift detection (population/stability metrics, canaries) and retraining triggers.
  5. Evaluation
    • Offline: metrics such as PR-AUC; calibrated precision/recall at business thresholds.
    • Online: A/B design, guardrails, and holdouts.
  6. Feedback Loops and Safety
    • Appeals workflow, human-in-the-loop review.
    • Bias, privacy, and PII handling.
  7. Cost, Reliability, and Rollback
    • Compute/latency budget, reliability/SLOs, and rollback plans.

Finally, list the top three failure modes you anticipate and concrete mitigations for each.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.