Handle missing and unavailable predictive features

Q: Handle missing and unavailable predictive features

This is a Machine Learning interview question from Intuit for Product Analyst roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Scenario

You are building a model to predict whether a user will successfully file taxes (binary label success) for a TurboTax-like product.

One of the most predictive features is:

session_count = cumulative number of sessions a user has had in the product.

However:

In the training dataset, session_count has many values that are 0 and many that are missing .
In production, stakeholders claim that session_count is not available at scoring time (i.e., when you need to make the prediction), even though it appears in the schema.
Exploratory analysis shows session_count is negatively correlated with success .

Questions

Data quality / missingness: How would you investigate why session_count is often 0 or missing, and how would you treat these cases during modeling?
Training-serving skew: If session_count is not available at inference time, what are your options? How do you decide whether to (a) drop it, (b) engineer a proxy, or (c) change the prediction timing / problem definition?
Interpretation: Provide at least two plausible explanations for the negative correlation between session_count and success (including an “opposite viewpoint”), and describe what additional data or analyses you would use to validate/refute each explanation.

Constraints / expectations

Assume you have standard product event logs available in principle (page views, step completions, timestamps), but instrumentation may be imperfect.
Your answer should cover: leakage risk, feature availability, and how you would communicate tradeoffs to stakeholders.

Handle missing and unavailable predictive features

Scenario

Questions

Constraints / expectations

Solution

Comments (0)