PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Other

Tune metrics for imbalanced classification

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in machine learning for rare-event detection, testing skills in preprocessing messy data, handling high-cardinality categoricals, designing validation splits, selecting imbalanced evaluation metrics and cost-sensitive decision thresholds, and reasoning about operational trade-offs.

  • hard
  • Other
  • Machine Learning
  • Data Scientist

Tune metrics for imbalanced classification

Company: Other

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You must detect fraudulent transactions where positives are 0.5% of all cases. Data has 10% missing values, heavy outliers, and categorical high-cardinality features. (a) Propose a preprocessing pipeline for missing values and outliers. (b) Choose training/validation splits and justify stratified cross-validation. (c) Select evaluation metrics; compare ROC-AUC vs PR-AUC vs F1 at a chosen threshold; define cost-sensitive objective. (d) Give a concrete method to address class imbalance (e.g., calibrated class weights, focal loss, or SMOTE with care to avoid leakage). (e) Provide a real-world example where false positives are costlier than false negatives (or vice versa) and how that changes thresholding and monitoring.

Quick Answer: This question evaluates a data scientist's competency in machine learning for rare-event detection, testing skills in preprocessing messy data, handling high-cardinality categoricals, designing validation splits, selecting imbalanced evaluation metrics and cost-sensitive decision thresholds, and reasoning about operational trade-offs.

Related Interview Questions

  • Derive and regularize logistic regression - Other (hard)
  • Design anomaly detection and handle imbalanced logistic regression - Other (Medium)
  • Extract companies from noisy text - Other (hard)
  • Evaluate and select K in K-means - Other (medium)
  • Explain SVM kernels and complexity - Other (hard)
Other logo
Other
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
2
0

Fraud Detection With Rare Positives (0.5%) and Messy Data

You are designing a supervised transaction-level fraud detector. Positives (fraud) are rare at 0.5% of all cases. The dataset has ~10% missing values, heavy-tailed outliers, and high-cardinality categorical features (e.g., merchant_id, device_id).

Answer the following:

  1. Preprocessing a) Propose a concrete preprocessing pipeline to handle missing values and outliers for both numeric and categorical features. Address high-cardinality categoricals and leakage prevention.
  2. Training/Validation Splits b) Specify how you would split the data for training/validation/testing. Justify stratified cross-validation (and when to prefer time-aware or group-aware schemes).
  3. Evaluation and Costing c) Select and justify evaluation metrics in this imbalanced setting. Compare ROC-AUC vs PR-AUC vs F1 at a chosen threshold. Define a cost-sensitive objective and the optimal decision threshold given costs.
  4. Class Imbalance d) Provide a concrete method to address class imbalance (e.g., calibrated class weights, focal loss, or SMOTE). Explain exactly how to apply it without leakage.
  5. Business Trade-offs e) Give a real-world example where false positives are costlier than false negatives (or vice versa), and explain how that changes thresholding and monitoring in production.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Other•More Data Scientist•Other Data Scientist•Other Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.