Design fraud detection from raw transactions
System Design: End-to-End Transaction Fraud Detection
Context
You are given a large, multi-table dataset of transactions and customer/merchant metadata. Fraud labels arrive with delays (e.g., chargebacks weeks later) and may be partial (e.g., only for reviewed or disputed transactions). Design an end-to-end system to decide, in real time, whether to approve, decline, or send each transaction to manual review.
Requirements
Cover the following aspects with clear assumptions and rationale:
-
Data and Feature Engineering
-
Velocity features (multi-horizon counts/sums/uniques).
-
Graph/link features across entities (user/card/email/device/IP/merchant).
-
Device and IP signals (fingerprinting, geolocation, proxy/TOR, ASN risk).
-
Labels, Class Imbalance, and Latency
-
Handling severe class imbalance.
-
Handling delayed/partial labels and selective-label bias.
-
Training and Validation Splits
-
Splitting to avoid leakage in time and across entities.
-
Ensuring offline/online feature parity.
-
Decision Thresholding and Review Capacity
-
Approve/Decline/Review policy with cost-sensitive thresholds.
-
Meeting a fixed manual review capacity.
-
Real-Time Scoring and Latency Budgets
-
Online feature retrieval and model serving under strict latency.
-
Fallbacks and degradation strategies.
-
Feedback Loops
-
Incorporating manual review outcomes and chargebacks.
-
Exploration/holdout strategies to mitigate bias.
-
Monitoring and Alerting
-
Drift (input/output), TPR/FPR with delayed labels, calibration, approval/decline rates, review queue health.
-
Backtesting Plan
-
Time-ordered replay, off-policy evaluation, simulation of review capacity, metrics and confidence intervals.
State reasonable assumptions if needed and justify key design choices.
Constraints & Assumptions
-
Preserve the scope, facts, inputs, and requested outputs from the prompt above.
-
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
-
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
Clarifying Questions to Ask
-
Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.
-
State explicit assumptions before making sizing or architecture decisions.
-
Prioritize the functional path first, then address reliability, security, observability, and rollout.
What a Strong Answer Covers
-
A scoped requirements summary with concrete non-goals and success metrics.
-
ML-specific data, model, evaluation, serving, and monitoring choices.
-
Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.
-
A validation, monitoring, migration, and launch plan appropriate for the risk level.
Follow-up Questions
-
What breaks first at 10x traffic or data volume?
-
How would you degrade gracefully during dependency failures?
-
What metrics and alerts would prove the design is healthy after launch?