Design real-time payments fraud model under constraints
Company: Roblox
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: HR Screen
You’re tasked with reducing unauthorized purchases by minors using their parents’ credit cards on a large gaming platform with real-time checkout. Design a production ML solution that decides among actions {allow, step-up auth (e.g., CVV/SCA), hold-for-review, block} within 30 ms p99.
Answer precisely:
1) Problem framing and labels: With chargebacks/disputes arriving 2–8 weeks later and some never disputed, define positives/negatives. Would you treat this as PU learning, cost-sensitive classification, or uplift modeling for action choice? Justify.
2) Class imbalance: If positives are ~0.2%, specify loss, sampling/weighting strategy (e.g., focal loss vs class weights), and how you’ll calibrate scores. Show the decision threshold formula minimizing expected cost: argmin_t [FP(t)*C_fp + FN(t)*C_fn + ActionCosts].
3) Features: Propose high-signal, low-latency features (payment velocity, device consistency, age-on-payment, billing-IP mismatch, historical dispute rates, network/household signals). Explain leakage risks and how you’ll do out-of-fold target encoding safely.
4) Real-time architecture: Sketch online feature store, TTLs, and fallbacks for cold-start or feature timeouts. What do you cache at edge vs compute on demand? How do you enforce p99<30 ms?
5) Drift/adversaries: Describe backtesting with strictly forward time splits, population stability/PSI monitors, and online shadow evaluation. How do you update without amplifying feedback loops?
6) Evaluation: Choose metrics beyond PR-AUC (e.g., cost curves, expected profit, constrained ROC for max FP rate). Describe offline policy evaluation (IPS/DR) to estimate impact of step-up auth vs block before running risky full AB.
7) Safety/UX: Propose a tiered action policy (risk score → action), human review routing, and appeals. What fairness/age-related checks do you implement, and what business guardrails (e.g., max block rate for verified adults) do you enforce?
Quick Answer: This question evaluates a data scientist's competency in real-time fraud detection and policy design, including cost-sensitive modeling, handling delayed/positive–unlabeled labels, severe class imbalance, low-latency feature engineering and online feature stores, drift and adversarial monitoring, offline policy evaluation, and fairness and UX constraints; it belongs to the Machine Learning domain and tests both conceptual understanding and practical system-level application. It is commonly asked to assess an interviewee's ability to balance latency and business costs, reason about delayed and noisy labels, design deployable low-latency architectures, and define evaluation metrics and safety guardrails for production fraud policies.