How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Netflix.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Netflix during technical interviews.

Explain self-attention, LoRA, Adam vs SGD, ViT

Quick Overview

This question evaluates understanding of modern Machine Learning/Deep Learning topics, including self-attention mechanics (queries, keys, values and scaled logits), Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning and memory savings, optimizer behavior (Adam versus SGD with momentum), and architectural trade-offs between Vision Transformers and CNNs including patch-size considerations. It is categorized under Machine Learning and is commonly asked because it probes both conceptual understanding and practical application—testing reasoning about training dynamics, model scaling, fine-tuning strategies, and resource/performance trade-offs.

Answer the following ML/Deep Learning interview questions:

Describe self-attention in Transformer models. What are the queries, keys, and values, and how is the attention output computed?
Why are attention logits divided by $\sqrt{d_k}$ (where $d_k$ is the key/query dimension) before the softmax?
Describe LoRA (Low-Rank Adaptation) for fine-tuning large models. How does it modify the weight update during fine-tuning, and what are its main benefits?
Why does LoRA often reduce GPU memory consumption compared to full fine-tuning?
What is the difference between Adam and SGD (including SGD with momentum)? When might you prefer one over the other?
Compare Vision Transformers (ViT) and CNNs . What are the main pros and cons of each?
What factors influence the choice of ViT patch size (e.g., 8×8 vs 16×16 vs 32×32), and what are the trade-offs?

Quick Overview

Answer the following ML/Deep Learning interview questions:

Describe self-attention in Transformer models. What are the queries, keys, and values, and how is the attention output computed?
Why are attention logits divided by $\sqrt{d_k}$ (where $d_k$ is the key/query dimension) before the softmax?
Describe LoRA (Low-Rank Adaptation) for fine-tuning large models. How does it modify the weight update during fine-tuning, and what are its main benefits?
Why does LoRA often reduce GPU memory consumption compared to full fine-tuning?
What is the difference between Adam and SGD (including SGD with momentum)? When might you prefer one over the other?
Compare Vision Transformers (ViT) and CNNs . What are the main pros and cons of each?
What factors influence the choice of ViT patch size (e.g., 8×8 vs 16×16 vs 32×32), and what are the trade-offs?

Explain self-attention, LoRA, Adam vs SGD, ViT

Quick Overview

Solution

Comments (0)

Explain self-attention, LoRA, Adam vs SGD, ViT

Quick Overview

Solution

Comments (0)