Explain the vanishing gradient problem in deep neural networks.
In your answer:
-
Describe how backpropagation works at a high level and why gradients can vanish in deep networks.
-
Show how the choice of
activation function
(e.g., sigmoid, tanh, ReLU) affects gradient magnitude.
-
Discuss common techniques (including activation choices) to mitigate vanishing gradients.