This question evaluates a data scientist's competency in designing scalable, leakage-safe training, validation, evaluation, and serving pipelines for imbalanced weekly churn prediction, touching on time-based splits, class imbalance and evaluation metrics, distributed or out-of-core training, feature leakage audits, calibration and decisioning, drift monitoring, experimentation, and scalable inference. It is commonly asked to assess practical and architectural judgment in production machine learning—balancing data engineering, model evaluation, compute constraints, and operational monitoring—and falls under Machine Learning/Data Science with a blend of conceptual understanding and practical application.
You are building a weekly churn prediction model for a streaming service with:
Design an end-to-end plan for training, validating, evaluating, and serving the weekly churn model. Your plan must cover:
Login required