This question evaluates competency in applied machine learning for regression under covariate shift, focusing on diagnosing missing support, robust modeling, and distribution-shift mitigation for an unseen low-income bracket.
You're building a supervised model (regression) to predict California housing prices using a dataset similar to the classic California Housing data. One key covariate is household income. The training data contains no observations from the lowest-income bracket (< $25k), but the deployed model must perform well across all income ranges, including this unseen bracket at inference time.
Assume the deployment/test distribution will include the full income range, including < $25k. You may optionally have access to unlabeled production covariates (features only) that include the missing bracket.
Design a modeling approach that achieves robust performance across all income ranges, with special attention to the unseen lowest-income bracket. Your answer should cover:
You may reference techniques like domain similarity, incremental retraining, covariate shift correction, transfer learning, and feature scaling.
Login required