This Machine Learning question evaluates understanding of model calibration and probabilistic outputs via temperature scaling, requiring formulation of the negative log-likelihood, analytic gradient derivation with respect to a scalar temperature, and coding an optimization to learn that parameter.
You have a trained multi-class classifier that outputs logits z(x) ∈ R^K for input x (the classifier is fixed; only calibration is learned). Temperature scaling calibrates predicted probabilities as:
p_i(x; T) = softmax(z_i(x) / T)
where T > 0 is a single scalar temperature shared across classes and inputs.
You are given a held-out validation set with logits and true labels, and you must learn T by minimizing negative log-likelihood (NLL).
Login required