Design a robust fraud detection system

Q: Design a robust fraud detection system

This question evaluates a candidate's competency in end-to-end machine learning system design for real-time fraud detection, covering time-aware data splitting, feature engineering for high-cardinality and severely imbalanced classes, model selection under latency and cost constraints, calibration and thresholding, monitoring during delayed-label periods, safe online rollout, and adversarial defenses. It is commonly asked to assess the ability to balance statistical trade-offs and production engineering requirements in the Machine Learning domain, emphasizing practical application-level system design that also requires conceptual understanding of delayed labels, cost-sensitive evaluation, and operational monitoring.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Real-Time Card Fraud Detector — End-to-End Design

Context

Fraud base rate ≈ 0.2% (severe class imbalance)
Labels arrive with a 14-day delay (e.g., chargebacks/confirmed fraud)
Latency SLO: p95 inference < 50 ms; throughput 2k TPS
Cost matrix (per decision): FP = $5 (lost conversion + manual review), FN =$ 200 (average fraud loss after recovery)

Tasks

Data/Labeling: Propose time-aware train/validation/test splits that respect the 14‑day label delay and avoid leakage from post-transaction outcomes (chargeback windows, reversals). Provide a concrete split scheme and rationale.
Features: Propose 10+ robust features (velocity, device/merchant risk, graph features, etc.). Explain handling of high‑cardinality categoricals and target leakage pitfalls. Describe how you would ensure feature freshness in production.
Modeling: Compare supervised approaches (e.g., XGBoost, calibrated deep nets) versus anomaly detection (e.g., Isolation Forest) given sparse positives. When and how would you hybridize them?
Evaluation: Choose metrics (PR AUC vs ROC AUC vs expected cost) and justify. Design a thresholding procedure that maximizes expected profit under the given cost matrix. Provide the optimization objective and describe probability calibration.
Drift/Monitoring: Define concrete drift and performance monitors (population drift, PSI/JS, calibration, expected cost per transaction). How would you operate during the 14‑day label delay period?
Online Rollout: Propose a safe shadow/holdback plan and guardrails to cap business risk (e.g., block‑rate ceilings, human‑in‑the‑loop). How do you reconcile offline metrics with online KPIs?
Adversarial Behavior: Describe 3 defenses against adaptive fraudsters (e.g., randomization, ensembling with behavior‑based models, canary features) and how you would validate they work.

Design a robust fraud detection system

Real-Time Card Fraud Detector — End-to-End Design

Solution

Comments (0)

Design a robust fraud detection system

Overview

Real-Time Card Fraud Detector — End-to-End Design

Solution

Comments (0)