PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Boston Consulting Group

Detect Data Leakage in Supervised Learning Pipelines

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in detecting and preventing data leakage in supervised learning pipelines, understanding bias–variance decomposition, recognizing regularization effects on sparsity, and implementing a temporally correct logistic regression with held‑out AUC evaluation.

  • hard
  • Boston Consulting Group
  • Machine Learning
  • Data Scientist

Detect Data Leakage in Supervised Learning Pipelines

Company: Boston Consulting Group

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Take-home Project

##### Scenario Company screens ML engineers with a 90-minute CodeSignal test containing conceptual MCQs and Python modeling tasks. ##### Question State and interpret the bias and variance terms in the bias–variance decomposition. Which regularization technique(s) can shrink linear-model coefficients exactly to zero and why? Name two practical approaches for detecting data leakage in a supervised learning pipeline. Given dataframe df(user_id, event_time, event_type, purchase), build a binary classifier predicting whether a user will purchase within the next 7 days and report AUC on a held-out set. Implement logistic regression with gradient descent using only numpy; provide convergence diagnostics. ##### Hints Discuss bias-variance trade-off, L1 geometry, validation splits, temporal leakage checks, and write clean, vectorized Python.

Quick Answer: This question evaluates a candidate's competency in detecting and preventing data leakage in supervised learning pipelines, understanding bias–variance decomposition, recognizing regularization effects on sparsity, and implementing a temporally correct logistic regression with held‑out AUC evaluation.

Related Interview Questions

  • Design and sample for credit default prediction - Boston Consulting Group (Medium)
  • Explain AUC, imbalance, losses, and networks - Boston Consulting Group (medium)
  • Build and evaluate imbalanced binary classifier - Boston Consulting Group (medium)
  • Reduce overfitting under constraints - Boston Consulting Group (hard)
  • Achieve 0.95 precision via thresholding - Boston Consulting Group (medium)
Boston Consulting Group logo
Boston Consulting Group
Aug 4, 2025, 10:55 AM
Data Scientist
Take-home Project
Machine Learning
4
0

ML Take‑home: Bias–Variance, Regularization, Leakage, and From‑scratch Logistic Regression

Context

You are given user event logs in a Pandas dataframe df with columns:

  • user_id: unique user identifier
  • event_time: timestamp of the event
  • event_type: categorical event name (e.g., view, click, add_to_cart, purchase)
  • purchase: indicator (0/1) if the event is a purchase

Your goal is to build a leakage‑free binary classifier that predicts whether a user will purchase within the next 7 days, then evaluate AUC on a held‑out set.

Tasks

  1. Bias–variance decomposition
    • State the bias and variance terms and interpret them in the bias–variance decomposition.
  2. Regularization and sparsity
    • Which regularization technique(s) can shrink linear‑model coefficients exactly to zero, and why?
  3. Detecting data leakage
    • Name two practical approaches for detecting data leakage in a supervised learning pipeline.
  4. Modeling: Logistic regression from scratch
    • Using df(user_id, event_time, event_type, purchase), build a binary classifier to predict whether a user will purchase within the next 7 days.
    • Use a temporally correct split and report AUC on a held‑out set.
    • Implement logistic regression with gradient descent using only numpy for the model (pandas allowed for data prep). Provide basic convergence diagnostics.

Implementation requirements

  • Ensure no temporal leakage: features must use data up to an anchor time; labels look forward 7 days after the anchor.
  • Clean, vectorized Python; no sklearn for the model or metrics (implement AUC yourself).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Boston Consulting Group•More Data Scientist•Boston Consulting Group Data Scientist•Boston Consulting Group Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.