You are interviewing for a Machine Learning Engineer role at a FinTech company.
Part 1: Explain the following ML fundamentals:
-
What is overfitting?
-
How can you detect overfitting during model development?
-
How do L1 and L2 regularization reduce overfitting, and how do they differ?
Part 2: You are given a payment-transaction dataset for fraud detection. Each row represents one transaction and includes a binary label is_fraud, along with typical transaction features such as amount, timestamp-derived features, merchant category, country, payment method, device attributes, and customer history aggregates.
Build a runnable machine-learning pipeline that trains a fraud detection model. Your solution should:
-
Split the data into train, validation, and test sets without leaking future information.
-
Handle missing values and categorical features.
-
Address severe class imbalance.
-
Train at least one reasonable baseline model.
-
Evaluate the model using metrics appropriate for fraud detection.
-
Explain what you would improve if you had more time.
You may use standard ML libraries and AI coding tools, but the final code must run end-to-end.