This question evaluates a data scientist's competence in designing end-to-end machine learning systems for fraud detection, emphasizing challenges such as delayed labels, severe class imbalance, and evolving data distributions (concept drift) in near-real-time scoring.
You are designing a fraud-detection system for an online payments product that must score transactions in (near) real time. Labels for fraud (e.g., chargebacks) arrive with delays, fraud is rare (severe class imbalance), and fraud patterns evolve over time (concept drift).
Outline the end-to-end ML workflow, covering:
Additionally, explain how you would handle:
Note: Discuss techniques such as resampling, cost-sensitive learning, ROC-AUC/PR-AUC, sliding windows, and automated retraining triggers.
Login required