PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Capital One

Design a robust fraud detection system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in end-to-end machine learning system design for real-time fraud detection, covering time-aware data splitting, feature engineering for high-cardinality and severely imbalanced classes, model selection under latency and cost constraints, calibration and thresholding, monitoring during delayed-label periods, safe online rollout, and adversarial defenses. It is commonly asked to assess the ability to balance statistical trade-offs and production engineering requirements in the Machine Learning domain, emphasizing practical application-level system design that also requires conceptual understanding of delayed labels, cost-sensitive evaluation, and operational monitoring.

  • hard
  • Capital One
  • Machine Learning
  • Data Scientist

Design a robust fraud detection system

Company: Capital One

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You’re tasked with building a real-time fraud detector for card transactions. Context: - Class imbalance: fraud rate ≈ 0.2%. - Labels arrive with 14-day delay (chargebacks/confirmed fraud). - Latency SLO: p95 inference < 50 ms; throughput 2k TPS. - Cost matrix (per decision): FP = $5 (lost conversion + manual review), FN = $200 (average fraud loss after recovery). Tasks: 1) Data/labeling: Describe how you would construct time-aware train/validation/test splits and avoid leakage from post-transaction outcomes (e.g., chargeback windows, reversals). Specify a concrete split scheme and rationale. 2) Features: Propose 10+ robust features (e.g., velocity, device/merchant risk, graph features). Explain handling of high-cardinality categoricals and target leakage pitfalls. How would you implement feature freshness guarantees? 3) Modeling: Compare supervised (e.g., XGBoost, calibrated deep nets) vs anomaly detection (e.g., Isolation Forest) given sparse positives. When would you hybridize? 4) Evaluation: Choose metrics and justify (PR AUC vs ROC AUC vs expected cost). Design a thresholding procedure that maximizes expected profit under the given cost matrix. Show the exact optimization objective and how you’d calibrate probabilities. 5) Drift/monitoring: Define concrete drift and performance monitors (populations, PSI/JS, calibration, cost per transaction). How would you operate in the 14-day label delay period? 6) Online rollout: Propose a safe shadow/holdback plan and guardrails to cap business risk (e.g., block-rate ceilings, human-in-the-loop). How do you reconcile offline metrics with online KPIs? 7) Adversarial behavior: Describe 3 defenses against adaptive fraudsters (e.g., randomization, ensembling with behavior-based models, canary features) and how you’d validate they work.

Quick Answer: This question evaluates a candidate's competency in end-to-end machine learning system design for real-time fraud detection, covering time-aware data splitting, feature engineering for high-cardinality and severely imbalanced classes, model selection under latency and cost constraints, calibration and thresholding, monitoring during delayed-label periods, safe online rollout, and adversarial defenses. It is commonly asked to assess the ability to balance statistical trade-offs and production engineering requirements in the Machine Learning domain, emphasizing practical application-level system design that also requires conceptual understanding of delayed labels, cost-sensitive evaluation, and operational monitoring.

Related Interview Questions

  • Deep-dive XGBoost handling and overfitting - Capital One (medium)
  • Build House Price Model Responsibly - Capital One (easy)
  • Design robber detection from surveillance video - Capital One (easy)
  • How would you design delay and watchlist models? - Capital One (medium)
  • Explain core ML concepts and lifecycle - Capital One (medium)
Capital One logo
Capital One
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
7
0

Real-Time Card Fraud Detector — End-to-End Design

Context

  • Fraud base rate ≈ 0.2% (severe class imbalance)
  • Labels arrive with a 14-day delay (e.g., chargebacks/confirmed fraud)
  • Latency SLO: p95 inference < 50 ms; throughput 2k TPS
  • Cost matrix (per decision): FP = 5(lostconversion+manualreview),FN=5 (lost conversion + manual review), FN = 5(lostconversion+manualreview),FN= 200 (average fraud loss after recovery)

Tasks

  1. Data/Labeling: Propose time-aware train/validation/test splits that respect the 14‑day label delay and avoid leakage from post-transaction outcomes (chargeback windows, reversals). Provide a concrete split scheme and rationale.
  2. Features: Propose 10+ robust features (velocity, device/merchant risk, graph features, etc.). Explain handling of high‑cardinality categoricals and target leakage pitfalls. Describe how you would ensure feature freshness in production.
  3. Modeling: Compare supervised approaches (e.g., XGBoost, calibrated deep nets) versus anomaly detection (e.g., Isolation Forest) given sparse positives. When and how would you hybridize them?
  4. Evaluation: Choose metrics (PR AUC vs ROC AUC vs expected cost) and justify. Design a thresholding procedure that maximizes expected profit under the given cost matrix. Provide the optimization objective and describe probability calibration.
  5. Drift/Monitoring: Define concrete drift and performance monitors (population drift, PSI/JS, calibration, expected cost per transaction). How would you operate during the 14‑day label delay period?
  6. Online Rollout: Propose a safe shadow/holdback plan and guardrails to cap business risk (e.g., block‑rate ceilings, human‑in‑the‑loop). How do you reconcile offline metrics with online KPIs?
  7. Adversarial Behavior: Describe 3 defenses against adaptive fraudsters (e.g., randomization, ensembling with behavior‑based models, canary features) and how you would validate they work.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.