This question evaluates a candidate's ability to design and implement an end-to-end churn prediction pipeline in scikit-learn, testing skills in data splitting and leakage prevention, feature preprocessing for numeric and categorical variables, handling class imbalance, model selection and baselines, hyperparameter tuning, probability calibration, and packaging for production. It is commonly asked to assess practical machine learning engineering and applied model-development competencies—ensuring reproducible validation and proper use of tooling such as Pipeline, ColumnTransformer, GridSearchCV, cross-validation, and joblib—and falls under the Machine Learning category with a primary focus on practical application complemented by conceptual understanding.
You are building a churn prediction model for a subscription business. Churn is defined as whether a customer cancels or becomes inactive in the next 30 days. The data is tabular with a mix of numeric and categorical features. The positive class (churners) is typically imbalanced.
Describe, step-by-step, how you would construct, train, validate, and evaluate a churn-prediction model in scikit-learn, including:
Include and explain the use of Pipeline, ColumnTransformer, GridSearchCV, cross-validation, and joblib.
Login required