Design anomaly detection and handle imbalanced logistic regression

Q: Design anomaly detection and handle imbalanced logistic regression

This is a Machine Learning interview question from Other for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

You receive a time‑stamped transactions dataset: columns [event_time (UTC), customer_id, merchant_id, amount, country, device_type, features...], label is_fraud ∈ {0,1} with 0.4% positives across 50M rows. Deliver a supervised fraud model and an unsupervised anomaly detector. A) Explain logistic regression end‑to‑end: the logit link and odds, decision boundary, effects of L1/L2/elastic‑net on coefficients and feature selection, why scaling matters for convergence, and how to obtain well‑calibrated probabilities (e.g., Platt vs isotonic) after class weighting or resampling. B) Imbalance: design training/evaluation to avoid leakage and handle the 0.4% base rate. Be specific about temporal CV (train<t<val<t<test), stratification, resampling only on train (SMOTE/ADASYN vs class weights vs focal loss), threshold selection under asymmetric costs (assume FN costs $100 and FP $2—derive the optimal threshold rule and how you’d estimate it), and metrics you’ll optimize (PR‑AUC, recall@precision≥0.9, expected cost). C) Anomaly detection plan: choose and justify one method (Isolation Forest, One‑Class SVM, deep autoencoder) for cold‑start merchants with no labels; define features, windowing, contamination rate selection, and alerting. Explain validation without labels (synthetic anomalies, investigator concordance), plus drift monitoring, retraining cadence, and safe rollout. D) List two subtle leakage risks (e.g., using post‑event chargebacks, future‑derived aggregations) and two ways you’ll detect/mitigate concept drift post‑deployment.

Design anomaly detection and handle imbalanced logistic regression

Comments (0)