Explain key ML theory and techniques
Company: Amazon
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
##### Question
This Amazon Machine Learning Engineer onsite covers a breadth of core ML theory and applied modeling. Be ready to go deep on each of the following:
1. **XGBoost parallel computation.** Explain how XGBoost achieves parallelism during training. Compare feature-parallel vs. data-parallel (histogram-based) split finding, describe distributed training across machines, and discuss the trade-offs in memory, speed, and accuracy.
2. **Layer normalization in Transformers.** Give the mathematical formulation, explain where it is applied (pre-norm vs. post-norm), why it stabilizes training, and its effect on gradient flow. Contrast it with batch normalization.
3. **Multimodal neural network design.** Design a network that fuses text and images. Describe early/late/cross-attention fusion strategies, how to align modalities, how to handle missing modalities, and how to choose loss functions and evaluation metrics.
4. **Collaborative filtering.** Compare user-based vs. item-based neighborhood methods and matrix factorization (including implicit feedback). Discuss regularization, cold-start mitigation, and scaling to sparse, large datasets.
5. **Multi-armed bandits.** Formulate the problem and define regret. Compare epsilon-greedy, UCB, and Thompson Sampling, address non-stationary and contextual settings, and describe offline policy evaluation and safe deployment.
6. **Logistic regression.** Derive the log-likelihood and gradients, compare L1 vs. L2 regularization, interpret coefficients as odds ratios, and handle class imbalance, calibration, and decision-threshold selection.
Quick Answer: This interview question evaluates core ML concepts, assumptions, math intuition, training/evaluation trade-offs, and practical failure modes in a realistic interview setting. A strong answer for Explain key ML theory and techniques states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.