PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Netflix

Explain self-attention, LoRA, Adam vs SGD, ViT

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of modern Machine Learning/Deep Learning topics, including self-attention mechanics (queries, keys, values and scaled logits), Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning and memory savings, optimizer behavior (Adam versus SGD with momentum), and architectural trade-offs between Vision Transformers and CNNs including patch-size considerations. It is categorized under Machine Learning and is commonly asked because it probes both conceptual understanding and practical application—testing reasoning about training dynamics, model scaling, fine-tuning strategies, and resource/performance trade-offs.

  • medium
  • Netflix
  • Machine Learning
  • Machine Learning Engineer

Explain self-attention, LoRA, Adam vs SGD, ViT

Company: Netflix

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

Answer the following ML/Deep Learning interview questions: 1) **Describe self-attention** in Transformer models. What are the queries, keys, and values, and how is the attention output computed? 2) **Why are attention logits divided by \(\sqrt{d_k}\)** (where \(d_k\) is the key/query dimension) before the softmax? 3) **Describe LoRA (Low-Rank Adaptation)** for fine-tuning large models. How does it modify the weight update during fine-tuning, and what are its main benefits? 4) **Why does LoRA often reduce GPU memory consumption** compared to full fine-tuning? 5) **What is the difference between Adam and SGD** (including SGD with momentum)? When might you prefer one over the other? 6) **Compare Vision Transformers (ViT) and CNNs**. What are the main pros and cons of each? 7) **What factors influence the choice of ViT patch size** (e.g., 8×8 vs 16×16 vs 32×32), and what are the trade-offs?

Quick Answer: This question evaluates understanding of modern Machine Learning/Deep Learning topics, including self-attention mechanics (queries, keys, values and scaled logits), Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning and memory savings, optimizer behavior (Adam versus SGD with momentum), and architectural trade-offs between Vision Transformers and CNNs including patch-size considerations. It is categorized under Machine Learning and is commonly asked because it probes both conceptual understanding and practical application—testing reasoning about training dynamics, model scaling, fine-tuning strategies, and resource/performance trade-offs.

Related Interview Questions

  • Compare Losses and Explain LoRA - Netflix (medium)
  • Design a robust conversion propensity model - Netflix (hard)
  • Explain tokenization and Transformer variants - Netflix (medium)
  • Design Real-Time Fraud Detection with XGBoost Model - Netflix (medium)
  • Address Fraud Detection with Imbalance and Concept Drift Solutions - Netflix (medium)
|Home/Machine Learning/Netflix

Explain self-attention, LoRA, Adam vs SGD, ViT

Netflix logo
Netflix
Feb 23, 2026, 12:00 AM
mediumMachine Learning EngineerTechnical ScreenMachine Learning
10
0

Answer the following ML/Deep Learning interview questions:

  1. Describe self-attention in Transformer models. What are the queries, keys, and values, and how is the attention output computed?
  2. Why are attention logits divided by dk\sqrt{d_k}dk​​ (where dkd_kdk​ is the key/query dimension) before the softmax?
  3. Describe LoRA (Low-Rank Adaptation) for fine-tuning large models. How does it modify the weight update during fine-tuning, and what are its main benefits?
  4. Why does LoRA often reduce GPU memory consumption compared to full fine-tuning?
  5. What is the difference between Adam and SGD (including SGD with momentum)? When might you prefer one over the other?
  6. Compare Vision Transformers (ViT) and CNNs . What are the main pros and cons of each?
  7. What factors influence the choice of ViT patch size (e.g., 8×8 vs 16×16 vs 32×32), and what are the trade-offs?
Loading comments...

Browse More Questions

More Machine Learning•More Netflix•More Machine Learning Engineer•Netflix Machine Learning Engineer•Netflix Machine Learning•Machine Learning Engineer Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.