Build and evaluate a full ML pipeline

Q: Build and evaluate a full ML pipeline

This is a Machine Learning interview question from Google for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

You must predict both (1) probability that a user will spend >$0 in the next 7 days (classification) and (2) expected spend in the next 7 days (regression). Training data are events and orders up to 2025-08-31; predictions start on 2025-09-01. Design an end-to-end pipeline: feature generation (including time-windowed aggregates), leakage controls (e.g., excluding post-cutoff signals like refund_time), time-based cross-validation, handling class imbalance, and model choices for each task. Specify metrics (e.g., PR-AUC, calibrated Brier, pinball loss for quantiles), a calibration plan, and how you’d pick a threshold given an asymmetric cost matrix. Describe how you’d detect and mitigate segment-specific regressions, choose and justify an offline/online evaluation plan (with rollout and holdbacks), and set up post-deployment monitoring for drift, label delay, and model decay. Finally, provide two concrete examples of features that are predictive but risky for leakage and how you’d re-specify them safely.

Build and evaluate a full ML pipeline

Comments (0)