PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Other

Design anomaly detection and handle imbalanced logistic regression

Last updated: Mar 29, 2026

Quick Overview

This Machine Learning interview question evaluates a data scientist's competencies in supervised logistic regression modeling, class imbalance management, probability calibration, temporal validation, and unsupervised anomaly detection applied to transactional fraud data.

  • Medium
  • Other
  • Machine Learning
  • Data Scientist

Design anomaly detection and handle imbalanced logistic regression

Company: Other

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Technical Screen

You receive a time‑stamped transactions dataset: columns [event_time (UTC), customer_id, merchant_id, amount, country, device_type, features...], label is_fraud ∈ {0,1} with 0.4% positives across 50M rows. Deliver a supervised fraud model and an unsupervised anomaly detector. A) Explain logistic regression end‑to‑end: the logit link and odds, decision boundary, effects of L1/L2/elastic‑net on coefficients and feature selection, why scaling matters for convergence, and how to obtain well‑calibrated probabilities (e.g., Platt vs isotonic) after class weighting or resampling. B) Imbalance: design training/evaluation to avoid leakage and handle the 0.4% base rate. Be specific about temporal CV (train<t<val<t<test), stratification, resampling only on train (SMOTE/ADASYN vs class weights vs focal loss), threshold selection under asymmetric costs (assume FN costs $100 and FP $2—derive the optimal threshold rule and how you’d estimate it), and metrics you’ll optimize (PR‑AUC, recall@precision≥0.9, expected cost). C) Anomaly detection plan: choose and justify one method (Isolation Forest, One‑Class SVM, deep autoencoder) for cold‑start merchants with no labels; define features, windowing, contamination rate selection, and alerting. Explain validation without labels (synthetic anomalies, investigator concordance), plus drift monitoring, retraining cadence, and safe rollout. D) List two subtle leakage risks (e.g., using post‑event chargebacks, future‑derived aggregations) and two ways you’ll detect/mitigate concept drift post‑deployment.

Quick Answer: This Machine Learning interview question evaluates a data scientist's competencies in supervised logistic regression modeling, class imbalance management, probability calibration, temporal validation, and unsupervised anomaly detection applied to transactional fraud data.

Related Interview Questions

  • Derive and regularize logistic regression - Other (hard)
  • Extract companies from noisy text - Other (hard)
  • Evaluate and select K in K-means - Other (medium)
  • Explain SVM kernels and complexity - Other (hard)
  • Compare trees, RF, and gradient boosting - Other (medium)
Other logo
Other
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
1
0
Loading...

You receive a time‑stamped transactions dataset: columns [event_time (UTC), customer_id, merchant_id, amount, country, device_type, features...], label is_fraud ∈ {0,1} with 0.4% positives across 50M rows. Deliver a supervised fraud model and an unsupervised anomaly detector. A) Explain logistic regression end‑to‑end: the logit link and odds, decision boundary, effects of L1/L2/elastic‑net on coefficients and feature selection, why scaling matters for convergence, and how to obtain well‑calibrated probabilities (e.g., Platt vs isotonic) after class weighting or resampling. B) Imbalance: design training/evaluation to avoid leakage and handle the 0.4% base rate. Be specific about temporal CV (train<t<val<t<test), stratification, resampling only on train (SMOTE/ADASYN vs class weights vs focal loss), threshold selection under asymmetric costs (assume FN costs 100andFP100 and FP 100andFP2—derive the optimal threshold rule and how you’d estimate it), and metrics you’ll optimize (PR‑AUC, recall@precision≥0.9, expected cost). C) Anomaly detection plan: choose and justify one method (Isolation Forest, One‑Class SVM, deep autoencoder) for cold‑start merchants with no labels; define features, windowing, contamination rate selection, and alerting. Explain validation without labels (synthetic anomalies, investigator concordance), plus drift monitoring, retraining cadence, and safe rollout. D) List two subtle leakage risks (e.g., using post‑event chargebacks, future‑derived aggregations) and two ways you’ll detect/mitigate concept drift post‑deployment.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Other•More Data Scientist•Other Data Scientist•Other Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.