This question evaluates a candidate's ability to design an end-to-end transaction fraud detection system, testing competencies in feature engineering, handling delayed and partial labels and class imbalance, real-time feature retrieval and model serving, decision thresholding under manual-review constraints, feedback-loop integration, monitoring, and backtesting. It is commonly asked in ML system design interviews to assess both conceptual understanding and practical application of production machine learning and data engineering trade-offs in the ML System Design domain for data scientist roles.
You are given a large, multi-table dataset of transactions and customer/merchant metadata. Fraud labels arrive with delays (e.g., chargebacks weeks later) and may be partial (e.g., only for reviewed or disputed transactions). Design an end-to-end system to decide, in real time, whether to approve, decline, or send each transaction to manual review.
Cover the following aspects with clear assumptions and rationale:
State reasonable assumptions if needed and justify key design choices.
Login required