How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a easy difficulty Machine Learning question, commonly asked during Onsite rounds at Intuit.

What role is this question designed for?

This question is commonly asked for Product Analyst candidates at Intuit during technical interviews.

Handle missing and unavailable predictive features

Quick Overview

This question evaluates a product analyst's machine learning competencies in diagnosing data quality issues, handling feature availability and training-serving skew, and interpreting counterintuitive correlations, testing both conceptual understanding and practical application for model development and deployment.

Scenario

You are building a model to predict whether a user will successfully file taxes (binary label success) for a TurboTax-like product.

One of the most predictive features is:

session_count = cumulative number of sessions a user has had in the product.

However:

In the training dataset, session_count has many values that are 0 and many that are missing .
In production, stakeholders claim that session_count is not available at scoring time (i.e., when you need to make the prediction), even though it appears in the schema.
Exploratory analysis shows session_count is negatively correlated with success .

Questions

Data quality / missingness: How would you investigate why session_count is often 0 or missing, and how would you treat these cases during modeling?
Training-serving skew: If session_count is not available at inference time, what are your options? How do you decide whether to (a) drop it, (b) engineer a proxy, or (c) change the prediction timing / problem definition?
Interpretation: Provide at least two plausible explanations for the negative correlation between session_count and success (including an “opposite viewpoint”), and describe what additional data or analyses you would use to validate/refute each explanation.

Constraints / expectations

Assume you have standard product event logs available in principle (page views, step completions, timestamps), but instrumentation may be imperfect.
Your answer should cover: leakage risk, feature availability, and how you would communicate tradeoffs to stakeholders.

Quick Overview

Scenario

You are building a model to predict whether a user will successfully file taxes (binary label success) for a TurboTax-like product.

One of the most predictive features is:

session_count = cumulative number of sessions a user has had in the product.

However:

In the training dataset, session_count has many values that are 0 and many that are missing .

In production, stakeholders claim that session_count is not available at scoring time (i.e., when you need to make the prediction), even though it appears in the schema.

Exploratory analysis shows session_count is negatively correlated with success .

Questions

Data quality / missingness: How would you investigate why session_count is often 0 or missing, and how would you treat these cases during modeling?

Training-serving skew: If session_count is not available at inference time, what are your options? How do you decide whether to (a) drop it, (b) engineer a proxy, or (c) change the prediction timing / problem definition?

Interpretation: Provide at least two plausible explanations for the negative correlation between session_count and success (including an “opposite viewpoint”), and describe what additional data or analyses you would use to validate/refute each explanation.

Handle missing and unavailable predictive features

Quick Overview

Handle missing and unavailable predictive features

Scenario

Questions

Constraints / expectations

Write your answer

Handle missing and unavailable predictive features

Quick Overview

Handle missing and unavailable predictive features

Scenario

Questions

Constraints / expectations

Write your answer