Define scalable train/validation for churn

Q: Define scalable train/validation for churn

This is a Machine Learning interview question from HBO for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Weekly Churn Prediction: Training/Validation/Evaluation Plan

Context

You are building a weekly churn prediction model for a streaming service with:

50M active users per week
Severe class imbalance: weekly churn ≈ 0.3%
Features: watch-time aggregates, recency, device, payments, limited PII
Data stored in a data lake
Compute constraints:
- Model training must finish in < 6 hours
- Batch inference must score 50M users in < 2 hours

Task

Design an end-to-end plan for training, validating, evaluating, and serving the weekly churn model. Your plan must cover:

Time-based data splits to prevent leakage (e.g., sliding-window training; validate on next week; test on the following week)
Handling class imbalance (negative downsampling with class-weight correction, focal loss, calibrated thresholds) and why PR-AUC/Recall@K are preferable to ROC-AUC
Distributed or out-of-core training options (e.g., XGBoost on Spark, sparse logistic regression) and efficient, leakage-safe hyperparameter tuning (bandit/ASHA)
Feature leakage audits (e.g., removing post-label signals like refund flags) and feature store versioning
Calibration and decisioning (Platt/Isotonic, cost-sensitive thresholds, decile stability)
Offline–online consistency checks and drift monitoring (PSI, KS, population stability, SHAP distribution shifts)
Experiment plan to validate business lift (uplift modeling or targeting thresholds) and how to size the holdout
How to scale inference (vectorized joins, pre-aggregation, incremental updates) and backfills for late-arriving events

Define scalable train/validation for churn

Weekly Churn Prediction: Training/Validation/Evaluation Plan

Context

Task

Solution (Locked)

Comments (0)