How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a Medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Other.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Other during technical interviews.

Design anomaly detection and handle imbalanced logistic regression

Quick Overview

This Machine Learning interview question evaluates a data scientist's competencies in supervised logistic regression modeling, class imbalance management, probability calibration, temporal validation, and unsupervised anomaly detection applied to transactional fraud data.

You receive a time‑stamped transactions dataset: columns [event_time (UTC), customer_id, merchant_id, amount, country, device_type, features...], label is_fraud ∈ {0,1} with 0.4% positives across 50M rows. Deliver a supervised fraud model and an unsupervised anomaly detector. A) Explain logistic regression end‑to‑end: the logit link and odds, decision boundary, effects of L1/L2/elastic‑net on coefficients and feature selection, why scaling matters for convergence, and how to obtain well‑calibrated probabilities (e.g., Platt vs isotonic) after class weighting or resampling. B) Imbalance: design training/evaluation to avoid leakage and handle the 0.4% base rate. Be specific about temporal CV (train<t<val<t<test), stratification, resampling only on train (SMOTE/ADASYN vs class weights vs focal loss), threshold selection under asymmetric costs (assume FN costs $100 and FP$ 2—derive the optimal threshold rule and how you’d estimate it), and metrics you’ll optimize (PR‑AUC, recall@precision≥0.9, expected cost). C) Anomaly detection plan: choose and justify one method (Isolation Forest, One‑Class SVM, deep autoencoder) for cold‑start merchants with no labels; define features, windowing, contamination rate selection, and alerting. Explain validation without labels (synthetic anomalies, investigator concordance), plus drift monitoring, retraining cadence, and safe rollout. D) List two subtle leakage risks (e.g., using post‑event chargebacks, future‑derived aggregations) and two ways you’ll detect/mitigate concept drift post‑deployment.

Quick Overview

Design anomaly detection and handle imbalanced logistic regression

Quick Overview

Comments (0)

Design anomaly detection and handle imbalanced logistic regression

Quick Overview

Comments (0)