PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Boston Consulting Group

Improve Model Generalization with Cross-Validation and Feature Engineering

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's practical competence in supervised machine learning, covering stratified train/test splitting, reproducible preprocessing pipelines that standardize numeric features and robustly encode categoricals, training gradient-boosted models, and assessing discrimination with ROC AUC.

  • medium
  • Boston Consulting Group
  • Machine Learning
  • Data Scientist

Improve Model Generalization with Cross-Validation and Feature Engineering

Company: Boston Consulting Group

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Take-home Project

##### Scenario Using the cleaned retail data, you must build a model to predict whether a customer will place an order next month. ##### Question Split the prepared dataset into 80/20 train–test sets with stratification on the target variable. Standardize numeric features and one-hot encode categorical features in a reproducible pipeline. Train a gradient-boosted tree (e.g., XGBoost or LightGBM) and report AUC on the held-out test set. List two techniques you would use to improve the model’s generalization if AUC is low. ##### Hints Demonstrate scikit-learn pipelines and proper evaluation.

Quick Answer: This question evaluates a data scientist's practical competence in supervised machine learning, covering stratified train/test splitting, reproducible preprocessing pipelines that standardize numeric features and robustly encode categoricals, training gradient-boosted models, and assessing discrimination with ROC AUC.

Related Interview Questions

  • Design and sample for credit default prediction - Boston Consulting Group (Medium)
  • Explain AUC, imbalance, losses, and networks - Boston Consulting Group (medium)
  • Build and evaluate imbalanced binary classifier - Boston Consulting Group (medium)
  • Reduce overfitting under constraints - Boston Consulting Group (hard)
  • Achieve 0.95 precision via thresholding - Boston Consulting Group (medium)
Boston Consulting Group logo
Boston Consulting Group
Aug 4, 2025, 10:55 AM
Data Scientist
Take-home Project
Machine Learning
1
0

Predict Next-Month Orders: Train/Test Split, Pipeline, and AUC

Context

You are given a cleaned tabular retail dataset as a pandas DataFrame df. The binary target column will_order_next_month indicates whether a customer will place an order in the following month (1 = yes, 0 = no).

Tasks

  1. Split the data into 80/20 train–test sets with stratification on the target.
  2. Build a reproducible scikit-learn pipeline that:
    • Standardizes numeric features.
    • One-hot encodes categorical features (robust to unseen categories at test time).
  3. Train a gradient-boosted tree model (e.g., XGBoost or LightGBM).
  4. Report ROC AUC on the held-out test set.
  5. If AUC is low, list two techniques you would use to improve model generalization.

Hints

  • Demonstrate scikit-learn pipelines and proper evaluation.
  • Use ColumnTransformer to preprocess numerics and categoricals in one pipeline.
  • Ensure reproducibility with fixed random seeds.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Boston Consulting Group•More Data Scientist•Boston Consulting Group Data Scientist•Boston Consulting Group Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.