PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Capital One

Design a robust fraud detection system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in end-to-end machine learning system design for real-time fraud detection, covering time-aware data splitting, feature engineering for high-cardinality and severely imbalanced classes, model selection under latency and cost constraints, calibration and thresholding, monitoring during delayed-label periods, safe online rollout, and adversarial defenses. It is commonly asked to assess the ability to balance statistical trade-offs and production engineering requirements in the Machine Learning domain, emphasizing practical application-level system design that also requires conceptual understanding of delayed labels, cost-sensitive evaluation, and operational monitoring.

  • hard
  • Capital One
  • Machine Learning
  • Data Scientist

Design a robust fraud detection system

Company: Capital One

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You’re tasked with building a real-time fraud detector for card transactions. Context: - Class imbalance: fraud rate ≈ 0.2%. - Labels arrive with 14-day delay (chargebacks/confirmed fraud). - Latency SLO: p95 inference < 50 ms; throughput 2k TPS. - Cost matrix (per decision): FP = $5 (lost conversion + manual review), FN = $200 (average fraud loss after recovery). Tasks: 1) Data/labeling: Describe how you would construct time-aware train/validation/test splits and avoid leakage from post-transaction outcomes (e.g., chargeback windows, reversals). Specify a concrete split scheme and rationale. 2) Features: Propose 10+ robust features (e.g., velocity, device/merchant risk, graph features). Explain handling of high-cardinality categoricals and target leakage pitfalls. How would you implement feature freshness guarantees? 3) Modeling: Compare supervised (e.g., XGBoost, calibrated deep nets) vs anomaly detection (e.g., Isolation Forest) given sparse positives. When would you hybridize? 4) Evaluation: Choose metrics and justify (PR AUC vs ROC AUC vs expected cost). Design a thresholding procedure that maximizes expected profit under the given cost matrix. Show the exact optimization objective and how you’d calibrate probabilities. 5) Drift/monitoring: Define concrete drift and performance monitors (populations, PSI/JS, calibration, cost per transaction). How would you operate in the 14-day label delay period? 6) Online rollout: Propose a safe shadow/holdback plan and guardrails to cap business risk (e.g., block-rate ceilings, human-in-the-loop). How do you reconcile offline metrics with online KPIs? 7) Adversarial behavior: Describe 3 defenses against adaptive fraudsters (e.g., randomization, ensembling with behavior-based models, canary features) and how you’d validate they work.

Quick Answer: This question evaluates a candidate's competency in end-to-end machine learning system design for real-time fraud detection, covering time-aware data splitting, feature engineering for high-cardinality and severely imbalanced classes, model selection under latency and cost constraints, calibration and thresholding, monitoring during delayed-label periods, safe online rollout, and adversarial defenses. It is commonly asked to assess the ability to balance statistical trade-offs and production engineering requirements in the Machine Learning domain, emphasizing practical application-level system design that also requires conceptual understanding of delayed labels, cost-sensitive evaluation, and operational monitoring.

Related Interview Questions

  • Deep-dive XGBoost handling and overfitting - Capital One (medium)
  • Build House Price Model Responsibly - Capital One (easy)
  • Design robber detection from surveillance video - Capital One (easy)
  • How would you design delay and watchlist models? - Capital One (medium)
  • Explain core ML concepts and lifecycle - Capital One (medium)
|Home/Machine Learning/Capital One

Design a robust fraud detection system

Capital One logo
Capital One
Oct 13, 2025, 9:49 PM
hardData ScientistOnsiteMachine Learning
9
0

Real-Time Card Fraud Detector — End-to-End Design

Context

  • Fraud base rate ≈ 0.2% (severe class imbalance)
  • Labels arrive with a 14-day delay (e.g., chargebacks/confirmed fraud)
  • Latency SLO: p95 inference < 50 ms; throughput 2k TPS
  • Cost matrix (per decision): FP = 5(lostconversion+manualreview),FN=5 (lost conversion + manual review), FN = 5(lostconversion+manualreview),FN= 200 (average fraud loss after recovery)

Tasks

  1. Data/Labeling: Propose time-aware train/validation/test splits that respect the 14‑day label delay and avoid leakage from post-transaction outcomes (chargeback windows, reversals). Provide a concrete split scheme and rationale.
  2. Features: Propose 10+ robust features (velocity, device/merchant risk, graph features, etc.). Explain handling of high‑cardinality categoricals and target leakage pitfalls. Describe how you would ensure feature freshness in production.
  3. Modeling: Compare supervised approaches (e.g., XGBoost, calibrated deep nets) versus anomaly detection (e.g., Isolation Forest) given sparse positives. When and how would you hybridize them?
  4. Evaluation: Choose metrics (PR AUC vs ROC AUC vs expected cost) and justify. Design a thresholding procedure that maximizes expected profit under the given cost matrix. Provide the optimization objective and describe probability calibration.
  5. Drift/Monitoring: Define concrete drift and performance monitors (population drift, PSI/JS, calibration, expected cost per transaction). How would you operate during the 14‑day label delay period?
  6. Online Rollout: Propose a safe shadow/holdback plan and guardrails to cap business risk (e.g., block‑rate ceilings, human‑in‑the‑loop). How do you reconcile offline metrics with online KPIs?
  7. Adversarial Behavior: Describe 3 defenses against adaptive fraudsters (e.g., randomization, ensembling with behavior‑based models, canary features) and how you would validate they work.
Loading comments...

Browse More Questions

More Machine Learning•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.