Design a real-time payment fraud detection system. Discuss: events and labels (chargebacks, disputes), feature store (user, device, merchant, graph features), model selection (tree ensembles, deep models, anomaly detection), rule engine + model ensemble, data pipeline and streaming inference, latency budgets and fallbacks, thresholding to balance false positives vs. fraud loss, human-in-the-loop review, concept drift and adversarial adaptation, explainability requirements, online experiments, monitoring (precision at top-K, approval rate, fraud rate), and incident response/rollback.

This question evaluates competency in ML system design for fraud detection, including real-time streaming inference, feature store architecture, delayed/noisy label handling, model selection and ensembling, latency budgeting, monitoring, and operational MLOps considerations.

How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a hard difficulty ML System Design question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Amazon during technical interviews.

Design a fraud detection system | Amazon Interview Question

System Design: Real-Time Payment Fraud Detection

Context

Design a real-time fraud detection system for online payments (card-not-present). The system must score each transaction during authorization and decide whether to approve, decline, or route to manual review within a tight latency budget.

Assume:

End-to-end p95 decision latency budget: 100 ms (from feature retrieval to decision), with soft degradations permitted.
Labels (e.g., chargebacks) arrive with delays (weeks). You must train with delayed/noisy labels and operate with streaming features.

Requirements

Discuss and propose designs for:

Events and Labels

What events to ingest (e.g., authorizations, captures, refunds, chargebacks, disputes, user actions).
How to define positive/negative labels (chargebacks, disputes) and handle label delay.

Feature Store

Feature categories (user, device, merchant, payment instrument, velocity, graph/network features).
Offline vs. online stores, consistency, TTL, backfilling, and time-travel for training.

Model Selection

Compare tree ensembles, deep models (e.g., sequence or representation models), and anomaly detection for cold start.
Calibration, class imbalance handling, and cost-sensitive learning.

Rule Engine + Model Ensemble

Combining deterministic rules with ML scores, ensembling strategies, and reason codes.

Data Pipeline and Streaming Inference

Ingestion, stream processing, feature computation, online retrieval, and a low-latency inference service.

Latency Budgets and Fallbacks

Budget breakdown, caching, degradation paths (e.g., rules-only), and idempotency.

Thresholding and Trade-offs

How to set thresholds to balance false positives vs. fraud loss; expected value formulation.

Human-in-the-Loop Review

Review queue design, sampling strategies, SLAs, active learning, and feedback loops.

Concept Drift and Adversarial Adaptation

Continuous training, drift detection, canaries, and defenses.

Explainability Requirements

Feature attributions, rule traces, and audit logging.

Online Experiments

A/B/shadow testing, guardrail metrics, ramp policy, and bias control.

Monitoring and Alerting

Precision at top-K, approval rate, fraud rate, latency SLOs, data quality, and feature drift.

Incident Response and Rollback

Kill switches, model/version rollback, runbooks, and postmortems.

System Design: Real-Time Payment Fraud Detection

Context

Assume:

End-to-end p95 decision latency budget: 100 ms (from feature retrieval to decision), with soft degradations permitted.
Labels (e.g., chargebacks) arrive with delays (weeks). You must train with delayed/noisy labels and operate with streaming features.

Requirements

Discuss and propose designs for:

Events and Labels

What events to ingest (e.g., authorizations, captures, refunds, chargebacks, disputes, user actions).
How to define positive/negative labels (chargebacks, disputes) and handle label delay.

Feature Store

Feature categories (user, device, merchant, payment instrument, velocity, graph/network features).
Offline vs. online stores, consistency, TTL, backfilling, and time-travel for training.

Model Selection

Compare tree ensembles, deep models (e.g., sequence or representation models), and anomaly detection for cold start.
Calibration, class imbalance handling, and cost-sensitive learning.

Rule Engine + Model Ensemble

Combining deterministic rules with ML scores, ensembling strategies, and reason codes.

Data Pipeline and Streaming Inference

Ingestion, stream processing, feature computation, online retrieval, and a low-latency inference service.

Latency Budgets and Fallbacks

Budget breakdown, caching, degradation paths (e.g., rules-only), and idempotency.

Thresholding and Trade-offs

How to set thresholds to balance false positives vs. fraud loss; expected value formulation.

Human-in-the-Loop Review

Review queue design, sampling strategies, SLAs, active learning, and feedback loops.

Concept Drift and Adversarial Adaptation

Continuous training, drift detection, canaries, and defenses.

Explainability Requirements

Feature attributions, rule traces, and audit logging.

Online Experiments

A/B/shadow testing, guardrail metrics, ramp policy, and bias control.

Monitoring and Alerting

Precision at top-K, approval rate, fraud rate, latency SLOs, data quality, and feature drift.

Incident Response and Rollback

Kill switches, model/version rollback, runbooks, and postmortems.

Design a fraud detection system

Quick Overview

System Design: Real-Time Payment Fraud Detection

Context

Requirements

Solution

Comments (0)

Design a fraud detection system

Quick Overview

System Design: Real-Time Payment Fraud Detection

Context

Requirements

Solution

Comments (0)