Handle missing data and outliers robustly

Q: Handle missing data and outliers robustly

This question evaluates competency in machine learning preprocessing and robustness, specifically handling missingness mechanisms (MAR vs MNAR), outlier treatment, model-specific feature handling for linear and tree-based algorithms, and empirical assessment of probability calibration and interpretability.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Customer Churn Modeling: Preprocessing, Missingness, Outliers, and Evaluation

Context

You are building a binary churn model for a consumer subscription/financial product. Features include:

Numeric spend: heavy right tail with ~2% extreme outliers.
Count variables: many zeros.
Categorical plan types (low to moderate cardinality).
Missing data: a mix of MAR and MNAR (e.g., some high-spend users omit income).

Answer the following:

Tasks

Propose end-to-end preprocessing pipelines for both:
- (A) Linear/logistic models, and
- (B) Tree ensembles (e.g., XGBoost/LightGBM/Random Forest), covering imputation (median, KNN, MICE, model-based), missingness indicators, robust scaling, and outlier treatment (winsorization vs robust estimators vs isolation-based filters).
Explain when each choice helps or hurts and why (e.g., winsorization in logistic vs tree splits; leakage risks in MICE/KNN; effects of scaling on KNN; when to avoid isolation forest).
Describe how you would empirically test the pipeline’s impact on probability calibration and SHAP explanations without optimistic bias.
If ~10% of records are MNAR on a key feature, what modeling and data-collection strategies would you use to mitigate bias?

Handle missing data and outliers robustly

Customer Churn Modeling: Preprocessing, Missingness, Outliers, and Evaluation

Context

Tasks

Solution

Comments (0)

Handle missing data and outliers robustly

Overview

Customer Churn Modeling: Preprocessing, Missingness, Outliers, and Evaluation

Context

Tasks

Solution

Comments (0)