Explain optimization and tensor vs pipeline parallelism
Company: NVIDIA
Role: Software Engineer
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates knowledge of deep learning optimization techniques (quantization, pruning, knowledge distillation, kernel/operator fusion, memory and throughput optimizations) and model parallelism strategies (tensor versus pipeline), measuring competency in performance engineering, scalability, communication patterns and system-level trade-offs for training and inference. It is commonly asked to assess an engineer's ability to reason about practical and conceptual trade-offs, collectives, bottlenecks and mitigations in distributed and resource-constrained environments, and it falls under the Machine Learning/Deep Learning domain requiring both conceptual understanding and practical application.