Churn Propensity with Logistic Regression: Theory, Validation, and Decisions
Context: You are building a churn propensity model (y ∈ {0,1}) using logistic regression for a subscription business. Positives (churners) are 3% of samples. Answer each part precisely and concisely.
1) Logistic regression likelihood and regularization
-
Starting from a Bernoulli likelihood, derive the logistic regression log-likelihood and its gradient with respect to β.
-
Show how L2 and L1 regularization change the objective from MLE to MAP. Write the new objective and gradients/subgradients (note: intercept typically unpenalized).
2) OLS assumptions vs. GLMs
List the OLS assumptions. For each assumption that is relevant to GLMs (e.g., multicollinearity, omitted variables, measurement error, non‑IID), explain how violations manifest in logistic regression and how regularization, feature engineering, or robust inference address them.
3) Class imbalance (3% positives)
-
Compare class weighting vs. focal loss vs. threshold moving. How does each affect calibration?
-
Describe a calibration check and a recalibration method (Platt vs. isotonic), and when you’d prefer each.
4) Temporal validation without leakage
Define a temporal validation scheme that avoids leakage. Include: feature freeze date, out‑of‑time test window, and a k‑fold strategy compatible with time. Specify the exact splits on a 6‑month dataset.
5) Correlated features and penalties
With correlated features, contrast L1 vs. L2 on sparsity, stability, and interpretability. Propose a workflow that yields a sparse, stable model with confidence intervals for odds ratios.
6) Business decision rule and value
Give one business‑aligned decision rule for choosing the score threshold using asymmetric costs, and show how to compute the expected value uplift over a "message all" policy.