PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Softmax, Argmax, and Cross Entropy

This tutorial explains argmax, softmax, and cross-entropy, covering definitions, mathematical formulas, properties of softmax probabilities, loss......

Author: PracHub

Published: 11/17/2025

Home›Knowledge Hub›Softmax, Argmax, and Cross Entropy

Softmax, Argmax, and Cross Entropy

By PracHub
November 17, 2025
0

Quick Overview

This tutorial explains argmax, softmax, and cross-entropy, covering definitions, mathematical formulas, properties of softmax probabilities, loss computation, and worked numerical examples that demonstrate how logits convert to probabilities and are evaluated in neural network classification and Q-learning contexts.

Machine Learning EngineerFree

image.png

Here’s a clean, intuitive explanation of argmax, softmax, and cross-entropy


1. Argmax

Argmax = “which class has the highest score?”

If your model outputs something like:

scores = [2.1, 0.4, 5.3]

Then:

argmax(scores) = 2   # because 5.3 is the largest
  • It does NOT give the value (5.3)
  • It gives the index of the maximum value

Used in:

  • Classification prediction
  • Q-learning (choose best action = argmax(Q-values))
  • Choosing best probability class after softmax

2. Softmax

Softmax turns raw model scores (logits) into probabilities:

pᵢ = e^(zᵢ) / Σⱼ e^(zⱼ)

Properties:

  • All probabilities are between 0 and 1
  • Sum to 1
  • Larger scores → larger probability (exponentially)

Example:

Scores:

z = [2, 1, 0]

Softmax:

e^2 = 7.389  
e^1 = 2.718  
e^0 = 1

sum = 11.107

softmax = [7.389/11.107, 2.718/11.107, 1/11.107]
         ≈ [0.665, 0.245, 0.090]

So the model thinks class 0 is most likely.


✅ 3. Cross-Entropy Loss

Cross-entropy measures how good the predicted probability distribution is.

For classification, if the true class is class ( y ):

CE=−log⁡(py)\text{CE} = -\log(p_y)CE=−log(py​)

  • If the model gives the true class high probability, loss is small.
  • If the model gives the true class low probability, loss is huge.

Examples:

If true class = 0

Model predicts:

p = [0.8, 0.1, 0.1]
Loss = -log(0.8) = 0.223  (very good)

Bad prediction:

p = [0.2, 0.3, 0.5]
Loss = -log(0.2) = 1.609  (bad)

Cross-entropy punishes wrong, confident predictions the most.


🎯 How these three work together

In neural networks:

  1. Model outputs scores (logits), e.g.

    [2.1, 0.4, 5.3]
    
  2. Softmax converts to probabilities

    [0.03, 0.01, 0.96]
    
  3. Cross-entropy checks how good the probability for the correct class is e.g. if true class = 2 → loss = −log(0.96)

  4. At prediction time, use argmax to pick the most likely class.


🔥 Concrete Numerical Example (Everything together)

Suppose:

  • Model outputs logits

    z = [1.0, 3.0, 2.0]
    
  • True label = class 1

Step 1 — Softmax

e^1 = 2.718  
e^3 = 20.085
e^2 = 7.389

sum = 30.192

probabilities =
[2.718/30.192, 20.085/30.192, 7.389/30.192]
≈ [0.09, 0.66, 0.24]

Step 2 — Cross-entropy

True class = 1 → p_y = 0.66

Loss:

CE = -log(0.66) ≈ 0.415

Step 3 — Prediction (argmax)

argmax(z) = 1 because "3.0" is the largest logit.

→ model predicts class 1.


Summary Table

ConceptWhat it doesFormulaExample
ArgmaxPicks largest scoreargmax(z)[1,5,3] → 1
SoftmaxConverts logits → probabilities( \frac{e^{z_i}}{\sum e^{z_j}} )[2,1,0] → [0.66,0.24,0.09]
Cross-EntropyMeasures how wrong the predicted probability is−log(p_y)true=1, p=0.66 → loss=0.415

Comments (0)

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.