You are asked to answer concisely but with depth across the following topics:
Explain how XGBoost achieves parallelism during training. State what can and cannot be parallelized and why.
Explain layer normalization in a Transformer block, including where it is applied (pre-LN vs post-LN), the formula, and why it is used instead of batch normalization.
Describe a general architecture for a multimodal neural network (e.g., text + image, or tabular + text). Include common fusion strategies and how to handle missing modalities.
Explain how collaborative filtering works, contrasting memory-based and model-based approaches. Provide the core formulas and how predictions are made.
Formulate the K-armed bandit problem and present at least two solution algorithms (e.g., UCB, Thompson Sampling). Show a small numeric example and discuss regret.
Derive logistic regression from a probabilistic viewpoint, provide the log-likelihood and gradient, and interpret coefficients. Mention regularization and decision boundaries.
Login required