Define scalable train/validation for churn

Q: Define scalable train/validation for churn

This question evaluates a data scientist's competency in designing scalable, leakage-safe training, validation, evaluation, and serving pipelines for imbalanced weekly churn prediction, touching on time-based splits, class imbalance and evaluation metrics, distributed or out-of-core training, feature leakage audits, calibration and decisioning, drift monitoring, experimentation, and scalable inference. It is commonly asked to assess practical and architectural judgment in production machine learning—balancing data engineering, model evaluation, compute constraints, and operational monitoring—and falls under Machine Learning/Data Science with a blend of conceptual understanding and practical application.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Loading...

Weekly Churn Prediction: Training/Validation/Evaluation Plan

Context

You are building a weekly churn prediction model for a streaming service with:

50M active users per week
Severe class imbalance: weekly churn ≈ 0.3%
Features: watch-time aggregates, recency, device, payments, limited PII
Data stored in a data lake
Compute constraints:
- Model training must finish in < 6 hours
- Batch inference must score 50M users in < 2 hours

Task

Design an end-to-end plan for training, validating, evaluating, and serving the weekly churn model. Your plan must cover:

Time-based data splits to prevent leakage (e.g., sliding-window training; validate on next week; test on the following week)
Handling class imbalance (negative downsampling with class-weight correction, focal loss, calibrated thresholds) and why PR-AUC/Recall@K are preferable to ROC-AUC
Distributed or out-of-core training options (e.g., XGBoost on Spark, sparse logistic regression) and efficient, leakage-safe hyperparameter tuning (bandit/ASHA)
Feature leakage audits (e.g., removing post-label signals like refund flags) and feature store versioning
Calibration and decisioning (Platt/Isotonic, cost-sensitive thresholds, decile stability)
Offline–online consistency checks and drift monitoring (PSI, KS, population stability, SHAP distribution shifts)
Experiment plan to validate business lift (uplift modeling or targeting thresholds) and how to size the holdout
How to scale inference (vectorized joins, pre-aggregation, incremental updates) and backfills for late-arriving events

Define scalable train/validation for churn

Quick Overview

Weekly Churn Prediction: Training/Validation/Evaluation Plan

Context

Task

Solution

Comments (0)