Design a Churn Model: Handle Missing Data and Justify

Q: Design a Churn Model: Handle Missing Data and Justify

This question evaluates a data scientist's competency in handling messy, temporal subscription data including missing-value strategies, class imbalance, temporal validation, feature engineering, model selection, and understanding ensemble methods and model generalization (Random Forest internals and overfitting/underfitting).

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Churn Prediction on Messy Subscription Data

Context

You are building a binary churn-prediction model for a subscription product. Historical customer-level data contains usage/activity, billing/payments, support interactions, demographics, and plan details. The data is messy: many fields have missing values, there is class imbalance (churn is rarer than non-churn), and features are time-dependent. We aim to predict whether a customer will churn in the next period (e.g., next 30 days) using only information available up to a cutoff date.

Assumptions:

Binary target: churn = 1 if a customer cancels or fails to renew in the next period; 0 otherwise.
Temporal validation is required (train on earlier periods, validate on later periods).
Some missingness is likely not at random (e.g., missing usage could reflect inactivity).

Tasks

How would you handle missing values in the training data and justify your approach?
Given this churn-prediction problem, which ML algorithm would you choose and why?
Explain how Random Forest works, including voting, feature bagging, and depth control.
Define overfitting vs. underfitting and describe techniques to detect and mitigate each.

Design a Churn Model: Handle Missing Data and Justify

Churn Prediction on Messy Subscription Data

Context

Tasks

Solution

Comments (0)