PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Google

Diagnose and fix flawed model fit

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in applied supervised learning diagnostics, including feature encoding, feature scaling, class imbalance mitigation, evaluation metric selection, calibration, and fairness-aware objective design.

  • hard
  • Google
  • Machine Learning
  • Data Scientist

Diagnose and fix flawed model fit

Company: Google

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You are handed an existing binary classifier (predicting churn=1) with the following issues: (a) the feature “month” is encoded as an integer 1–12 and used linearly; (b) continuous features of wildly different scales were not standardized; (c) positives are 3% of the data; (d) model is evaluated by overall accuracy. The business requires Recall ≥ 80% at Precision ≥ 30%. - Identify and explain the modeling mistakes. For month, propose and justify two encodings (e.g., one-hot vs. cyclical sin/cos) and when you’d prefer each. - Describe a training procedure to address scale and imbalance: feature standardization, class-weighted loss vs. resampling (SMOTE vs. undersampling), and how you’d select the weight ratio or sampling rate without biasing validation. - Specify an evaluation protocol: stratified cross-validation, primary metrics (PR AUC, recall@precision≥0.30), calibration assessment, and threshold selection to satisfy the operating point. Include how you’d compute and report confidence intervals for these metrics. - If the dataset is imbalanced across subgroups, define a cost-sensitive or group-weighted objective and the fairness/coverage checks you’d add before deployment. Provide a concrete example of a failure mode you would catch and how you’d mitigate it.

Quick Answer: This question evaluates a data scientist's competency in applied supervised learning diagnostics, including feature encoding, feature scaling, class imbalance mitigation, evaluation metric selection, calibration, and fairness-aware objective design.

Related Interview Questions

  • Explain ranking cold-start strategies - Google (medium)
  • Explain LLM fine-tuning and generative models - Google (medium)
  • Compare NLP tokenization and LLM recommendations - Google (medium)
  • Explain LLM lifecycle and trade-offs - Google (medium)
  • Build a bigram next-word predictor with weighted sampling - Google (medium)
Google logo
Google
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
4
0
Loading...

Fixing a Churn Classifier: Encoding, Imbalance, Evaluation, and Fairness

Context

You inherit a binary classifier that predicts churn=1. The current implementation has the following issues: (a) the feature "month" is encoded as an integer 1–12 and used linearly; (b) continuous features of very different scales are not standardized; (c) positives are 3% of the data; (d) model performance is reported by overall accuracy. The business requirement is Recall ≥ 80% at Precision ≥ 30%.

Tasks

  1. Identify and explain the modeling mistakes. For the "month" feature, propose and justify two encodings (e.g., one-hot vs. cyclical sin/cos) and state when you’d prefer each.
  2. Describe a training procedure to address scale and class imbalance, including:
    • Feature standardization choices.
    • Class-weighted loss vs. resampling (SMOTE vs. undersampling).
    • How to select the weight ratio or sampling rate without biasing validation.
  3. Specify an evaluation protocol covering:
    • Stratified cross-validation.
    • Primary metrics (PR AUC, recall at precision ≥ 0.30).
    • Calibration assessment and how to select a threshold that satisfies the operating point.
    • How to compute and report confidence intervals for these metrics.
  4. If the dataset is imbalanced across subgroups, define a cost-sensitive or group-weighted objective and the fairness/coverage checks you’d add before deployment. Provide a concrete example of a failure mode you would catch and how you’d mitigate it.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Google•More Data Scientist•Google Data Scientist•Google Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.