Define scalable train/validation for churn
Company: HBO
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Take-home Project
Design a training/validation/evaluation plan for a weekly churn prediction model at a streaming service with 50M active users and severe class imbalance (weekly churn ≈0.3%). Constraints: features include watch-time aggregates, recency, device, payments, and limited PII; data lives in a data lake; training must finish in <6 hours; inference must score 50M users in <2 hours.
Cover: (1) Time-based splits to prevent leakage (sliding-window training, validation on the next week, test on the following week); (2) Handling imbalance (negative downsampling with class-weight correction, focal loss, or calibrated thresholds) and why PR-AUC/Recall@K may be preferable to ROC-AUC; (3) Distributed or out-of-core training options (e.g., XGBoost on Spark, sparse logistic regression) and how you would tune hyperparameters efficiently (bandit/ASHA) without leakage; (4) Feature leakage audits (e.g., removing post-label signals like refund flags), and feature store versioning; (5) Calibration and decisioning (Platt/Isotonic, cost-sensitive thresholds, decile stability); (6) Offline–online consistency checks and drift monitoring (PSI, KS, population stability, SHAP distribution shifts); (7) An experiment plan to validate business lift (uplift modeling or targeting thresholds) and how to size the holdout; (8) How to scale inference (vectorized joins, pre-aggregation, incremental updates) and backfills for late-arriving events.
Quick Answer: This question evaluates a data scientist's competency in designing scalable, leakage-safe training, validation, evaluation, and serving pipelines for imbalanced weekly churn prediction, touching on time-based splits, class imbalance and evaluation metrics, distributed or out-of-core training, feature leakage audits, calibration and decisioning, drift monitoring, experimentation, and scalable inference. It is commonly asked to assess practical and architectural judgment in production machine learning—balancing data engineering, model evaluation, compute constraints, and operational monitoring—and falls under Machine Learning/Data Science with a blend of conceptual understanding and practical application.