Explain core ML concepts and design choices
Company: Snapchat
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
Answer the following ML fundamentals:
1) Explain the relationship between cross-entropy and KL divergence, and derive when cross-entropy equals entropy plus KL.
2) Explain how dropout induces a training/testing distribution mismatch; describe practical fixes (e.g., inverted dropout scaling, Monte Carlo dropout) and when to prefer each.
3) Justify why large language models typically use LayerNorm instead of BatchNorm; discuss implications for sequence length, micro-batching, and stability.
4) Compare optimizers (SGD, SGD with momentum, Adam): update rules, convergence behavior, generalization trade-offs, and when you would choose each.
5) Explain why PPO introduces a KL-based constraint/penalty (or clipping) and how it stabilizes policy updates; discuss hyperparameter tuning and failure modes.
Quick Answer: This question evaluates mastery of core machine learning fundamentals including probabilistic loss functions (cross-entropy and KL divergence), regularization and uncertainty techniques (dropout and MC dropout), normalization choices in large language models, optimizer behaviors (SGD, momentum, Adam), and policy optimization stability (PPO).