Design a fraud detection system

Q: Design a fraud detection system

This is a ML System Design interview question from Amazon for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: Real-Time Payment Fraud Detection

Context

Design a real-time fraud detection system for online payments (card-not-present). The system must score each transaction during authorization and decide whether to approve, decline, or route to manual review within a tight latency budget.

Assume:

End-to-end p95 decision latency budget: 100 ms (from feature retrieval to decision), with soft degradations permitted.
Labels (e.g., chargebacks) arrive with delays (weeks). You must train with delayed/noisy labels and operate with streaming features.

Requirements

Discuss and propose designs for:

Events and Labels

What events to ingest (e.g., authorizations, captures, refunds, chargebacks, disputes, user actions).
How to define positive/negative labels (chargebacks, disputes) and handle label delay.

Feature Store

Feature categories (user, device, merchant, payment instrument, velocity, graph/network features).
Offline vs. online stores, consistency, TTL, backfilling, and time-travel for training.

Model Selection

Compare tree ensembles, deep models (e.g., sequence or representation models), and anomaly detection for cold start.
Calibration, class imbalance handling, and cost-sensitive learning.

Rule Engine + Model Ensemble

Combining deterministic rules with ML scores, ensembling strategies, and reason codes.

Data Pipeline and Streaming Inference

Ingestion, stream processing, feature computation, online retrieval, and a low-latency inference service.

Latency Budgets and Fallbacks

Budget breakdown, caching, degradation paths (e.g., rules-only), and idempotency.

Thresholding and Trade-offs

How to set thresholds to balance false positives vs. fraud loss; expected value formulation.

Human-in-the-Loop Review

Review queue design, sampling strategies, SLAs, active learning, and feedback loops.

Concept Drift and Adversarial Adaptation

Continuous training, drift detection, canaries, and defenses.

Explainability Requirements

Feature attributions, rule traces, and audit logging.

Online Experiments

A/B/shadow testing, guardrail metrics, ramp policy, and bias control.

Monitoring and Alerting

Precision at top-K, approval rate, fraud rate, latency SLOs, data quality, and feature drift.

Incident Response and Rollback

Kill switches, model/version rollback, runbooks, and postmortems.

Design a fraud detection system

System Design: Real-Time Payment Fraud Detection

Context

Requirements

Solution (Locked)

Comments (0)