PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Intuit

Handle missing and unavailable predictive features

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a product analyst's machine learning competencies in diagnosing data quality issues, handling feature availability and training-serving skew, and interpreting counterintuitive correlations, testing both conceptual understanding and practical application for model development and deployment.

  • easy
  • Intuit
  • Machine Learning
  • Product Analyst

Handle missing and unavailable predictive features

Company: Intuit

Role: Product Analyst

Category: Machine Learning

Difficulty: easy

Interview Round: Onsite

## Scenario You are building a model to predict whether a user will **successfully file taxes** (binary label `success`) for a TurboTax-like product. One of the most predictive features is: - `session_count` = **cumulative number of sessions** a user has had in the product. However: - In the training dataset, `session_count` has many values that are **0** and many that are **missing**. - In production, stakeholders claim that `session_count` is **not available at scoring time** (i.e., when you need to make the prediction), even though it appears in the schema. - Exploratory analysis shows `session_count` is **negatively correlated** with `success`. ## Questions 1. **Data quality / missingness:** How would you investigate why `session_count` is often `0` or missing, and how would you treat these cases during modeling? 2. **Training-serving skew:** If `session_count` is not available at inference time, what are your options? How do you decide whether to (a) drop it, (b) engineer a proxy, or (c) change the prediction timing / problem definition? 3. **Interpretation:** Provide at least two plausible explanations for the negative correlation between `session_count` and `success` (including an “opposite viewpoint”), and describe what additional data or analyses you would use to validate/refute each explanation. ## Constraints / expectations - Assume you have standard product event logs available in principle (page views, step completions, timestamps), but instrumentation may be imperfect. - Your answer should cover: leakage risk, feature availability, and how you would communicate tradeoffs to stakeholders.

Quick Answer: This question evaluates a product analyst's machine learning competencies in diagnosing data quality issues, handling feature availability and training-serving skew, and interpreting counterintuitive correlations, testing both conceptual understanding and practical application for model development and deployment.

Related Interview Questions

  • When should products use AI? - Intuit (easy)
  • Engineer and Impute ZIP Features - Intuit (medium)
  • Engineer ZIP Features and Handle Missingness - Intuit (medium)
  • Decide when to model courier ETA - Intuit (hard)
  • Build a predictive model from TurboTax sample data - Intuit (easy)
Intuit logo
Intuit
Oct 21, 2025, 12:00 AM
Product Analyst
Onsite
Machine Learning
1
0

Scenario

You are building a model to predict whether a user will successfully file taxes (binary label success) for a TurboTax-like product.

One of the most predictive features is:

  • session_count = cumulative number of sessions a user has had in the product.

However:

  • In the training dataset, session_count has many values that are 0 and many that are missing .
  • In production, stakeholders claim that session_count is not available at scoring time (i.e., when you need to make the prediction), even though it appears in the schema.
  • Exploratory analysis shows session_count is negatively correlated with success .

Questions

  1. Data quality / missingness: How would you investigate why session_count is often 0 or missing, and how would you treat these cases during modeling?
  2. Training-serving skew: If session_count is not available at inference time, what are your options? How do you decide whether to (a) drop it, (b) engineer a proxy, or (c) change the prediction timing / problem definition?
  3. Interpretation: Provide at least two plausible explanations for the negative correlation between session_count and success (including an “opposite viewpoint”), and describe what additional data or analyses you would use to validate/refute each explanation.

Constraints / expectations

  • Assume you have standard product event logs available in principle (page views, step completions, timestamps), but instrumentation may be imperfect.
  • Your answer should cover: leakage risk, feature availability, and how you would communicate tradeoffs to stakeholders.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Intuit•More Product Analyst•Intuit Product Analyst•Intuit Machine Learning•Product Analyst Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.