Boston Consulting Group Data Scientist Machine Learning Questions (Updated 2026)

Hard

Reduce overfitting under constraints

Reduce Overfitting Under Latency Constraints (Tabular Regression) Context (assumed) - You have a tabular regression model with a large generalization ...

33 people solved

Achieve 0.95 precision via thresholding

Deploying a High-Precision Classifier on an Imbalanced Dataset You are given a binary classification problem with 50,000 samples and ~5% positives. Th...

16 people solved

Data Scientist Locked

Explain AUC, imbalance, losses, and networks

Imbalanced Classification & Regression: ROC/PR, Losses, and Training Strategies You are evaluating a binary classifier and a regression head in a mach...

31 people solved

Build and evaluate imbalanced binary classifier

Take‑home: Imbalanced Binary Classification with Temporal Split, Calibration, and Operating Point Selection Context You are given an event‑level datas...

24 people solved

Data Scientist Locked

Build a leak-free sklearn pipeline

Take-home: Imbalanced Binary Classification Pipeline with scikit-learn You are training a binary classifier on tabular data with the following feature...

44 people solved

Explain AUC, activations, ensembles, and imbalance

Machine Learning Metrics and Modeling Choices — Multi-part You are given model scores and binary labels for a small dataset and asked to compute ROC A...

13 people solved

Easy

Scale and Normalize: When to Use Each Method?

Feature Scaling Before Modeling (CodeSignal Notebook) Context You're preparing features in a notebook step before training a model. You have a pandas ...

23 people solved

Differentiate Overfitting and Underfitting in Machine Learning

ML/DL Fundamentals for a Recommendation Engine Context You are preparing for a take-home assessment on ML/DL fundamentals relevant to building a recom...

23 people solved

Hard

Detect Data Leakage in Supervised Learning Pipelines

ML Take‑home: Bias–Variance, Regularization, Leakage, and From‑scratch Logistic Regression Context You are given user event logs in a Pandas dataframe...

25 people solved

Easy

Interpret AUC Values and Handle Class Imbalance Techniques

AUC and Class Imbalance in Binary Classification Context You are evaluating a binary classifier using ROC–AUC and need to reason about performance und...

31 people solved

Design and sample for credit default prediction

A bank wants a model to predict 90-day credit card default at account-month level for proactive outreach. Class prevalence in production is about 2% d...

37 people solved

Improve Model Generalization with Cross-Validation and Feature Engineering

Predict Next-Month Orders: Train/Test Split, Pipeline, and AUC Context You are given a cleaned tabular retail dataset as a pandas DataFrame df. The bi...

11 people solved

Easy

Train GradientBoostingClassifier with 5-Fold Cross-Validation

Final Model Training: GradientBoostingClassifier with 5-Fold CV Context Assume the notebook already contains a prepared feature matrix X and a binary ...

16 people solved