You receive a TurboTax sample dataset (user-level and/or session-level) and are asked to build a predictive model.
Task
-
Pick a concrete prediction target (choose one and justify):
-
Probability a user will
file
within 14 days of starting.
-
Probability a user will
churn
(not file this season).
-
Expected
revenue
from the user this season.
-
Describe how you would:
-
Define labels and avoid label leakage.
-
Build features from product interactions and historical attributes.
-
Choose a baseline model and at least one stronger model.
-
Evaluate performance (metrics, calibration, slice performance).
-
Turn the model into an actionable recommendation (e.g., targeting, prioritization, interventions).
Constraints / realism
-
Data may be missing or delayed.
-
Class imbalance is likely (e.g., churn).
-
The business cares about interpretability and safe deployment.