Design a fraud detection system
Company: Google
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Technical Screen
## Scenario
You are designing an end-to-end **fraud detection system** for an online platform (e.g., e-commerce marketplace, payments, account signup, or ad traffic). The system should detect and prevent fraudulent activity while minimizing impact on legitimate users.
## Requirements
1. **Goal**: Predict whether an event (transaction / login / signup / ad click) is fraudulent and decide what action to take.
2. **Latency**: Support near-real-time decisioning (e.g., sub-second to a few seconds) for high-risk actions.
3. **Cold start**: Handle **new users / new devices / new merchants** with little or no historical data.
4. **Imbalanced data**: Fraud rate is low (e.g., <1%), so the dataset is highly **class-imbalanced**.
5. **Actions**: Decide between actions such as *allow*, *step-up verification (2FA / OTP)*, *manual review*, or *block*.
6. **Learning loop**: Incorporate delayed labels (chargebacks, user reports, investigation outcomes) and retrain/refresh models.
## What to cover
- Data sources and feature engineering (real-time + batch)
- Model choice(s) and how you handle cold start + imbalance
- Evaluation metrics and offline/online validation
- System architecture for training, serving, monitoring
- Abuse/adversarial considerations and how you prevent model exploitation
Quick Answer: This question evaluates a candidate's competency in designing end-to-end fraud detection machine learning systems, covering real-time and batch feature engineering, model selection and serving, handling cold starts and severe class imbalance, delayed label learning loops, and adversarial/abuse considerations.