This question evaluates a data scientist's practical competence in supervised machine learning, covering stratified train/test splitting, reproducible preprocessing pipelines that standardize numeric features and robustly encode categoricals, training gradient-boosted models, and assessing discrimination with ROC AUC.
You are given a cleaned tabular retail dataset as a pandas DataFrame df. The binary target column will_order_next_month indicates whether a customer will place an order in the following month (1 = yes, 0 = no).
ColumnTransformer
to preprocess numerics and categoricals in one pipeline.
Login required