PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Google

Build and evaluate a full ML pipeline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design and operationalize end-to-end machine learning pipelines, covering competencies in feature engineering, leakage control, temporal cross-validation, evaluation and calibration, thresholding under asymmetric costs, deployment rollouts, and post-deployment monitoring.

  • Medium
  • Google
  • Machine Learning
  • Data Scientist

Build and evaluate a full ML pipeline

Company: Google

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Technical Screen

You must predict both (1) probability that a user will spend >$0 in the next 7 days (classification) and (2) expected spend in the next 7 days (regression). Training data are events and orders up to 2025-08-31; predictions start on 2025-09-01. Design an end-to-end pipeline: feature generation (including time-windowed aggregates), leakage controls (e.g., excluding post-cutoff signals like refund_time), time-based cross-validation, handling class imbalance, and model choices for each task. Specify metrics (e.g., PR-AUC, calibrated Brier, pinball loss for quantiles), a calibration plan, and how you’d pick a threshold given an asymmetric cost matrix. Describe how you’d detect and mitigate segment-specific regressions, choose and justify an offline/online evaluation plan (with rollout and holdbacks), and set up post-deployment monitoring for drift, label delay, and model decay. Finally, provide two concrete examples of features that are predictive but risky for leakage and how you’d re-specify them safely.

Quick Answer: This question evaluates a candidate's ability to design and operationalize end-to-end machine learning pipelines, covering competencies in feature engineering, leakage control, temporal cross-validation, evaluation and calibration, thresholding under asymmetric costs, deployment rollouts, and post-deployment monitoring.

Related Interview Questions

  • Explain ranking cold-start strategies - Google (medium)
  • Explain LLM fine-tuning and generative models - Google (medium)
  • Compare NLP tokenization and LLM recommendations - Google (medium)
  • Explain LLM lifecycle and trade-offs - Google (medium)
  • Build a bigram next-word predictor with weighted sampling - Google (medium)
Google logo
Google
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
9
0

You must predict both (1) probability that a user will spend >$0 in the next 7 days (classification) and (2) expected spend in the next 7 days (regression). Training data are events and orders up to 2025-08-31; predictions start on 2025-09-01. Design an end-to-end pipeline: feature generation (including time-windowed aggregates), leakage controls (e.g., excluding post-cutoff signals like refund_time), time-based cross-validation, handling class imbalance, and model choices for each task. Specify metrics (e.g., PR-AUC, calibrated Brier, pinball loss for quantiles), a calibration plan, and how you’d pick a threshold given an asymmetric cost matrix. Describe how you’d detect and mitigate segment-specific regressions, choose and justify an offline/online evaluation plan (with rollout and holdbacks), and set up post-deployment monitoring for drift, label delay, and model decay. Finally, provide two concrete examples of features that are predictive but risky for leakage and how you’d re-specify them safely.

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Google•More Data Scientist•Google Data Scientist•Google Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.