This question evaluates a candidate's competency in designing low-latency, production-grade real-time machine learning systems for account-takeover detection in payment authorization, covering label definition with delayed/noisy labels, feature engineering, model selection and calibration, evaluation protocols, drift monitoring, and policy integration. It is commonly asked in the Machine Learning domain because it tests the ability to balance strict latency and data constraints with risk-management objectives, combining both conceptual understanding and practical application of ML systems engineering.
Design a real-time machine learning system that scores Venmo payment authorization events for ATO risk. The system must operate under strict latency and data constraints while dealing with delayed and noisy labels.
A) Precisely define the positive label (ATO) and the negative set. Discuss positive–unlabeled (PU) learning and how to construct reliable training data with delayed/noisy labels.
B) Propose features across device, IP, behavior, network/graph, and account age. For at least three features, specify leakage risks and how you would time-travel-proof them.
C) Select and justify a model family (e.g., gradient boosting with monotonic constraints). Describe probability calibration (Platt vs. isotonic) and how to maintain calibration across account-age cohorts.
D) Describe an offline evaluation protocol (time-based split, label-latency handling, group-aware CV) and online validation (shadow mode, interleaving with rules).
E) Outline drift/adversary monitoring and automated retraining triggers (e.g., PSI thresholds, population/conditional shift tests).
F) Explain how to combine ML scores with deterministic rules via a policy engine to meet business constraints (block, step-up auth, allow). Show how to set per-segment thresholds to hit target FP/FN budgets.
Login required