Predicting Contribution per Order with Low R²
Context
You are modeling contribution per order (a continuous per-order outcome such as margin or profit contribution) using a linear regression. The current model achieves R² = 0.07, indicating weak predictive performance. You care about both prediction accuracy and valid inference on key covariates (e.g., treatment effects, policy variables).
Tasks
(a) List concrete, practical steps to raise predictive performance without invalidating inference. Include:
-
Feature transformations (e.g., splines for basket size).
-
Interactions (e.g., treatment × daypart).
-
Appropriate error distribution/link (e.g., Gamma with log link) and when to use them.
-
Systematic leakage checks.
(b) Will simply adding another covariate reliably increase R² out-of-sample? Use cross-validation (CV) to demonstrate why or why not, and propose alternatives (GAMs, quantile regression, gradient boosting) that balance predictive performance with effect-estimation goals.
(c) Show how to use nested cross-validation and target-leakage tests to guard against p-hacking while iterating on features/hyperparameters.
(d) Explain when a low R² is acceptable for an unbiased average treatment effect (ATE) but unacceptable for accurate individual predictions.