This question tests the ability to design a real-time ML decisioning system for payment fraud, covering model selection under strict latency constraints and severe class imbalance. It evaluates competency in ML system design, feature engineering with delayed labels, and trade-offs between false positives and false negatives in a financial context.
##### Question
Given two credit-card transactions with limited information, decide whether to accept or decline each and justify your reasoning. Outline a full fraud-detection strategy for card transactions: data required, feature engineering, real-time rules, and model-based approaches. Explain how unsupervised learning can be applied to detect fraudulent transactions and list suitable algorithms. Detail appropriate evaluation metrics for fraud models, including how to assess unsupervised methods without labeled data.
Quick Answer: This question tests the ability to design a real-time ML decisioning system for payment fraud, covering model selection under strict latency constraints and severe class imbalance. It evaluates competency in ML system design, feature engineering with delayed labels, and trade-offs between false positives and false negatives in a financial context.
Credit-Card Fraud Detection: Real-Time Decisioning and System Design
You are designing a real-time decisioning system for card-payment authorizations at a large payments company. At authorization time only a subset of features is available and you have a hard latency budget; the ground-truth outcome of a transaction (e.g. a chargeback) arrives weeks to months later.
For the accept/decline exercise, assume the following two minimal transaction snippets are available at decision time:
Transaction A
Channel: e-commerce (card-not-present)
Amount: $4,200
Cardholder home country: UK
Merchant country: US (electronics)
Local time at cardholder: 03:17
Device fingerprint: new to platform
IP geolocation: NG (Nigeria), VPN likely
Velocity: 5 auth attempts in the last 10 minutes on this card across different merchants
Transaction B
Channel: card-present (EMV chip + PIN)
Amount: $18.75
Cardholder home country: UK
Merchant country: UK (coffee shop)
Local time at cardholder: 12:41
Device/terminal: known merchant terminal, low dispute history
Velocity: consistent with the user's past pattern (daily coffee purchases)
This is an open-ended design problem. Work through it in four parts below.
Constraints & Assumptions
Latency:
the model's contribution to the auth decision must fit inside a sub-100ms end-to-end budget; aim for low-tens-of-milliseconds scoring.
Class imbalance:
confirmed fraud is typically well under 1% of transactions.
Label delay & censoring:
chargeback / confirmed-fraud labels arrive weeks-to-months later; declined transactions never receive an outcome label; some disputes are "friendly fraud" (mislabeled).
Action space:
for each transaction you may
approve
,
decline
,
step-up authenticate
(e.g. 3-D Secure / SCA, OTP, push approval), or
queue for asynchronous review
.
Objective:
minimize expected dollar loss, balancing fraud losses against false-decline (lost-sale + customer-friction) cost — not maximizing raw accuracy.
Clarifying Questions to Ask
A strong candidate scopes the problem before designing. Reasonable questions include:
What is the relative cost of a fraud loss versus a false decline, and does it vary by amount tier or channel? (This sets the decision thresholds.)
What is the latency SLA and the available compute / feature-store infrastructure?
Which authentication channels (3DS/SCA, OTP) are actually available, and what is the regulatory context (e.g. PSD2 SCA mandates, liability-shift rules)?
What label sources exist (chargeback reason codes, confirmed-fraud reports, manual-review outcomes) and how mature/reliable are they?
Are we deciding on the issuing side, acquiring side, or as a network/PSP — i.e. whose fraud loss are we minimizing?
What is the current baseline (rules engine? existing model?) and the operations team's review capacity?
Part 1 — Accept / decline the two transactions
For Transaction A and Transaction B, decide whether to accept, decline, or take a conditional action (e.g. step-up authentication), and justify your reasoning from the available signals. State explicitly what you would do if a conditional action's prerequisite (e.g. a 3DS challenge) is unavailable or fails.
What This Part Should Cover
A clear decision
per transaction
with reasoning grounded in the specific signals (velocity, geo mismatch, channel/EMV presence, amount, device tenure, time-of-day).
Identification of whether each transaction's signals
stack
toward a coherent risk narrative or point in an exculpatory direction — without confounding independent risk indicators with each other.
Awareness that more than two actions are available and that the appropriate action depends on the risk band, with a stated fallback when a conditional action is unavailable or fails.
An explicit cost trade-off (expected fraud loss vs. false-decline cost), ideally noting it scales with transaction amount.
Part 2 — End-to-end fraud-detection strategy
Outline a full fraud-detection strategy for card transactions. Cover: (a) the data required (what is available synchronously at auth vs. maintained from history vs. enrichment/labels), (b) feature engineering, (c) real-time deterministic rules, and (d) model-based approaches, including system architecture and how you meet the latency budget.
What This Part Should Cover
Data inventory
split by availability: synchronous auth fields, historical per-entity profiles & velocity counters (queryable in single-digit ms), and enrichment/label sources stored with arrival timestamps.
Feature families:
behavioral deviation/z-scores from the cardholder's own baseline, velocity/aggregation across multiple windows, geo/device (impossible travel, device novelty), merchant/BIN risk, graph/linkage, authentication outcomes.
Point-in-time correctness / leakage discipline
and online–offline feature parity.
Rules layer
(hard blocks, velocity caps, mismatch rules, allowlists) that is versioned and monitored, sitting in front of the model.
Model choice
(gradient-boosted trees as the tabular workhorse, with calibration; optional sequence/graph models; cascade for cost control) and a
decisioning architecture
with feature store, timeouts/fallbacks, shadow→champion/challenger rollout, and drift monitoring.
Part 3 — Unsupervised learning for fraud detection
Explain how unsupervised learning can be applied to fraud detection here, why it earns a place given the label situation, and list suitable algorithm families. Be specific about how an unsupervised signal is integrated into the overall system rather than treated as a standalone blocker.
What This Part Should Cover
Justification
tied to label scarcity/delay and detection of novel/cold-start fraud the supervised model hasn't seen.
A spread of
algorithm families
(isolation-based, density/distance, one-class/boundary, reconstruction/autoencoder, clustering, graph-based, sequence-based) — not a single algorithm.
Integration patterns:
as a feature into the supervised model, as a router to step-up/review, as a candidate generator for active learning, or a PU-learning hybrid.
The
outlier-≠-fraud caveat
and starting in alert/feature mode before auto-actioning.
Part 4 — Evaluation metrics
Detail appropriate evaluation metrics for fraud models. Address: (a) how you turn a model score into an action threshold given the cost asymmetry, (b) the right offline ranking/quality metrics under extreme imbalance, (c) validation discipline specific to fraud, and (d) how to evaluate unsupervised methods without labeled data.
What This Part Should Cover
A
cost-based threshold
derivation — obtained by comparing the expected costs of each action and solving for the crossover fraud probability — along with the need for
probability calibration
and amount-dependent / segmented thresholds.
Imbalance-aware ranking metrics
(PR-AUC / Average Precision over ROC-AUC; precision/recall/
Fβ
at the operating point; recall@budget, lift@top-k) plus business KPIs (fraud-$ rate on approved volume, false-decline rate, approval rate).
Validation discipline:
time-based (not random) splits, label-maturity/delay handling, and selection bias from unlabeled declined transactions.
Label-free evaluation of unsupervised methods:
delayed-label backtesting, analyst precision@N, proxy/weak labels, online A/B or shadow lift, and overlap with the supervised high-risk tail.
What a Strong Answer Covers
Across all four parts, the strongest answers treat this as one coherent cost-sensitive, real-time decisioning problem rather than four disconnected questions. Cross-cutting dimensions the interviewer looks for:
Cost-sensitivity threaded throughout
— the same expected-dollar-loss objective drives the Part 1 decisions, the rule/model layering in Part 2, the deployment caution in Part 3, and the thresholds in Part 4.
Respect for the two structural traps
— label leakage / point-in-time correctness, and delayed/censored/noisy labels — surfacing consistently in features (Part 2), unsupervised evaluation (Part 3), and validation (Part 4).
Layered, defense-in-depth thinking
— rules for the known-bad tail, calibrated models for the ambiguous middle, anomaly/graph detectors for novel and collusive fraud.
Production realism
— latency budgets with fallbacks, online/offline parity, calibration, drift monitoring, and safe rollout (shadow → champion/challenger → ramp).
Follow-up Questions
An adversary adapts: fraud patterns drift the week after you ship. How do you detect this in production and how often (and on what data window) do you retrain given label delay?
PSD2/SCA can
mandate
step-up for many CNP transactions, but exemptions (e.g. low-value, transaction-risk-analysis) let you skip it. How would you let the risk model drive exemption decisions while staying compliant?
How would you handle the
cold-start
problem for a brand-new merchant or a customer's first transaction, where per-entity history is empty?
Suppose false declines are quietly costing more than fraud losses (good customers are being blocked). How would you detect and quantify this, given that declined transactions have no outcome label?