This question evaluates a data scientist's grasp of loss-function selection for regression—specifically MSE versus MAE—covering optimization behavior, convexity, outlier sensitivity, probabilistic noise assumptions, and alignment of loss with business cost when predicting unscaled car prices in USD.
You are training a regression model to predict car prices in USD. The target variable is not scaled (i.e., still in dollars). Explain when and why you would choose to minimize Mean Squared Error (MSE) instead of Mean Absolute Error (MAE). Address all of the following:
(a) Optimization properties: Contrast gradients vs. subgradients (especially at zero) and the implications for SGD/Adam.
(b) Convexity: State whether each loss is convex and identify any incorrect claim that "MAE is non-convex."
(c) Sensitivity to outliers and bias: Discuss when a greater penalty on large errors is desirable.
(d) Probabilistic assumptions: Derive the noise model under which MSE (vs. MAE) is the maximum likelihood estimator (MLE).
(e) Business fit: Provide one concrete example where squaring dollar errors better matches cost (e.g., luxury models) and one where it does not.
(f) Effect of not scaling the target: Explain how the dollar magnitude interacts with learning rate and regularization.
Login required