You have a trained multi-class classifier that outputs logits z(x) ∈ R^K for input x (the classifier is fixed; only calibration is learned). Temperature scaling calibrates predicted probabilities as:
p_i(x; T) = softmax(z_i(x) / T)
where T > 0 is a single scalar temperature shared across classes and inputs.
You are given a held-out validation set with logits and true labels, and you must learn T by minimizing negative log-likelihood (NLL).
Login required