Explain why weight initialization matters in deep neural networks.
Then describe common initialization methods (such as random normal/uniform, Xavier/Glorot, and He initialization):
-
How each method chooses the initial weight distribution.
-
What problem(s) they are designed to solve (e.g., vanishing/exploding activations or gradients, symmetry breaking).
-
When you would choose each method based on the activation functions used.