Design an email spam detection system

Q: Design an email spam detection system

This is a ML System Design interview question from Amazon for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: End-to-End Email Spam Detection

Context

Design an end-to-end system that detects and handles spam emails at scale. Assume you are building for a large consumer email service handling high throughput and strict latency requirements. The design should cover data, ML, serving, experimentation, and operations.

Requirements

Problem Definition and Labeling
- Define the objective(s) and action outcomes (e.g., block, quarantine, inbox with banner).
- Labeling sources and policies.
Data Sources and Collection
- Inbound traffic, user reports, honeypots, abuse teams, reputation feeds.
- Collection, sampling, retention, and governance.
Feature Engineering
- Content features (text, URLs, attachments), headers, sender/domain/IP reputation, network/behavioral signals.
Model Choices and Training
- Baseline rules, supervised ML models, online learning.
- Handling class imbalance, feature hashing, model calibration.
Serving Architecture and Constraints
- Placement in the mail pipeline, APIs, latency/throughput targets, caching, fallbacks.
Thresholding and Calibration
- Score-to-action mapping, per-segment thresholds, calibration methods.
Evaluation Metrics
- Precision, recall, ROC/PR analysis, and cost-weighted metrics.
Abuse/Adversarial Defenses and Feedback Loops
- Evasion tactics, spoofing defenses, URL/attachment handling, user feedback integration.
Cold Start, Concept Drift, Retraining Cadence
- New senders/domains, seasonal drift, automated retraining.
Online Experimentation
- A/B testing, ramp strategies, guardrails.
Monitoring, Logging, Rollback
- Real-time and batch monitoring, alerting, safe rollback.
Privacy and Compliance
- Data minimization, encryption, regional residency, user controls.

Design an email spam detection system

System Design: End-to-End Email Spam Detection

Context

Requirements

Solution (Locked)

Comments (0)