End-to-End Conversion Modeling on a Raw Behavioral Dataset
Scenario
You receive a raw, event-level behavioral dataset (e.g., user actions, sessions, marketing touches) for a product funnel. Your goal is to predict whether a user converts within a defined window after an anchor time (e.g., first app open → completes sign-up or makes first purchase within 14 days). Assume the data includes timestamps, user/session IDs, event types, basic device/geo/campaign attributes, and may contain missing values and high-cardinality categories.
Task
Walk through your approach live:
-
Clarify problem setup
-
Define the prediction target, prediction time, and label window.
-
Choose the unit of analysis (user-level or session-level) and deduplicate.
-
Identify and avoid potential label leakage.
-
Exploratory Data Analysis (EDA)
-
Inspect schema, missingness, extremes, and class imbalance.
-
Explore univariate/bivariate relationships; time trends and seasonality.
-
Check high-cardinality categoricals and feature distributions.
-
Feature Engineering and Preprocessing
-
Propose features from behavioral events (recency/frequency, funnel steps, marketing, device/geo).
-
Handle missing values, encode categoricals, and scale/normalize as appropriate.
-
Modeling
-
Start with a baseline (e.g., majority class, simple logistic regression), then a stronger model (e.g., gradient boosting).
-
Describe training/validation/test split strategy (preferably time-based) and cross-validation.
-
Evaluation and Interpretation
-
Report performance using ROC AUC, PR AUC, log loss, calibration, and lift/gains.
-
Interpret coefficients or feature importances; discuss threshold selection.
-
Improvements and Experimentation
-
Recommend feature and model improvements; address data quality.
-
Propose how to use the model in an experiment; guardrails and monitoring.
Hints
-
Discuss missing-value handling, train/validation split, baseline models, ROC/AUC or lift, and feature engineering iterations.