Prompt
Answer the following ML fundamentals questions:
-
Batch Normalization (BatchNorm):
-
What trainable parameters does BatchNorm have?
-
What statistics are used during training vs inference?
-
Why does BatchNorm help optimization, and what are common pitfalls?
-
Optimizers:
Compare SGD, Momentum, RMSProp, and Adam.
-
What problem does each address?
-
When might plain SGD generalize better than Adam?
-
Regularization:
Compare
L1
and
L2
regularization.
-
How do they affect weights and sparsity?
-
How do they relate to MAP estimation (priors) if you know that framing?