Floating-point types and ablation study design
You are training deep neural networks on modern accelerators that support multiple floating-point formats (for example, float64, float32, float16, and bfloat16).
Answer the following:
-
What are the main differences between common floating-point types used in deep learning (e.g., float64, float32, float16, bfloat16)? Describe their trade-offs in terms of numerical precision, dynamic range, memory usage, and training speed.
-
During training, how can you detect that numerical precision loss (underflow, overflow, or excessive rounding error) is causing problems for your model?
-
Suppose you want to evaluate the impact of using different floating-point precisions on model quality and training stability. Design an ablation study to isolate and measure the effect of precision choice.