Onsite Machine Learning Interview: Multi-topic Questions
Answer all sections. Be precise and compare alternatives where asked. Favor concrete mechanisms over buzzwords.
1) XGBoost Parallelism and Distributed Training
Explain how XGBoost achieves parallel computation during training.
-
Compare feature-parallel vs data-parallel (histogram-based) approaches.
-
Discuss distributed training across machines (communication pattern, what is aggregated, how splits are found).
-
Outline trade-offs in memory, speed, and accuracy (e.g., effect of binning, sparsity, and histogram subtraction).
2) Layer Normalization in Transformers
Explain layer normalization in Transformers.
-
Where it is applied: pre-norm vs post-norm architectures.
-
Mathematical formulation of LayerNorm.
-
Why it stabilizes training and its effect on gradient flow and depth.
3) Designing a Multimodal Text–Image Model
Design a multimodal neural network that fuses text and images.
-
Describe early, late, and cross-attention fusion strategies (architectural sketches, when to use each).
-
How to align modalities (shared embedding spaces, contrastive pretraining, token alignment) and handle missing modalities at inference.
-
Choose appropriate loss functions per task and evaluation metrics.
4) Collaborative Filtering
Explain collaborative filtering approaches.
-
User-based vs item-based CF (similarities, neighborhoods, scalability).
-
Matrix factorization for implicit feedback, including regularization.
-
Cold-start mitigation strategies.
-
Scaling to sparse, large datasets.
5) Multi-Armed Bandits
Discuss multi-armed bandits.
-
Define regret.
-
Compare epsilon-greedy, UCB, and Thompson sampling.
-
Address non-stationary and contextual settings.
-
Describe offline policy evaluation and safe deployment practices.
6) Logistic Regression
For logistic regression:
-
Derive the log-likelihood and gradients.
-
Compare L1 vs L2 regularization.
-
Interpret coefficients as odds ratios.
-
Handle class imbalance and calibration.
-
Choose decision thresholds under class imbalance/costs.