PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Derive and compare core ML and RL methods

Last updated: May 2, 2026

Quick Overview

This question evaluates a candidate's mastery of core machine learning and reinforcement learning concepts, including optimization (gradient methods and batch‑size trade‑offs), supervised versus unsupervised algorithms, policy‑gradient RL and variance reduction, deep RL stability techniques, sequence model complexity (Transformers vs RNNs), embeddings and polysemy, and low‑compute fine‑tuning strategies. It is commonly asked to probe both theoretical understanding and practical engineering judgment about convergence, variance, computational and memory complexity, representation learning, and resource‑constrained model adaptation; the domain is Machine Learning and the assessment spans conceptual understanding and practical application.

  • hard
  • Amazon
  • Machine Learning
  • Data Scientist

Derive and compare core ML and RL methods

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Answer the following ML fundamentals rigorously—state assumptions, give equations, and justify trade‑offs. 1) Derive full‑batch gradient descent (GD) and stochastic gradient descent (SGD) updates for L(w)= (1/n)∑_{i=1..n} ℓ_i(w). Compare convergence, gradient variance, and wall‑clock efficiency; explain when SGD outperforms GD. 2) Define batch size. With n=50,000 samples, epochs=5, batch size b=200, compute update steps per epoch and total. If b increases to 2,000, compute new steps and propose a learning‑rate adjustment via linear scaling; explain when this rule fails. 3) Classify algorithms as supervised vs unsupervised and name one use‑case each: logistic regression, SVM, k‑NN, k‑means, PCA, t‑SNE, Isolation Forest. 4) Relate RL to supervised/unsupervised learning. Write the REINFORCE gradient ∇θ J = E[∑_t ∇θ log πθ(a_t|s_t) G_t] and show how a baseline b_t keeps the estimator unbiased while reducing variance; express the gradient for a length‑3 trajectory with returns G=[3,1,−1] and score‑function terms g1,g2,g3 using a constant baseline b=mean(G). 5) Explain how neural networks are integrated into RL (e.g., DQN, policy gradient, actor‑critic). For DQN, describe why target networks and experience replay stabilize training and a failure mode without them. 6) Contrast Transformers vs RNNs: parallelism, long‑range dependency handling, and complexity. For sequence length n=1024 and model dim d=512, estimate the asymptotic time/memory cost of self‑attention and name two techniques that mitigate quadratic scaling. 7) Define embeddings and polysemy. Propose a method to distinguish “King” in chess vs monarchy contexts using contextual encoders or multi‑sense embeddings; outline an intrinsic (WSD) and extrinsic (downstream accuracy) evaluation. 8) With a single 24‑GB GPU and a 7B model, design a low‑compute fine‑tuning plan (e.g., QLoRA/adapters, 4‑bit quantization, gradient checkpointing, mixed precision). Choose a LoRA rank r and specify batch size, sequence length, optimizer, and learning‑rate schedule. Provide a back‑of‑envelope estimate of trainable parameters using hidden size ≈4096 and ~32 layers; state any assumptions about which projections you adapt.

Quick Answer: This question evaluates a candidate's mastery of core machine learning and reinforcement learning concepts, including optimization (gradient methods and batch‑size trade‑offs), supervised versus unsupervised algorithms, policy‑gradient RL and variance reduction, deep RL stability techniques, sequence model complexity (Transformers vs RNNs), embeddings and polysemy, and low‑compute fine‑tuning strategies. It is commonly asked to probe both theoretical understanding and practical engineering judgment about convergence, variance, computational and memory complexity, representation learning, and resource‑constrained model adaptation; the domain is Machine Learning and the assessment spans conceptual understanding and practical application.

Related Interview Questions

  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Explain NLP/RL concepts used in LLM agents - Amazon (hard)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
5
0

ML Fundamentals Technical Screen — Multi‑part Question

Context: You are given a set of core machine learning topics to address rigorously. For each part, state assumptions, give equations, reason about trade‑offs, and compute requested quantities.

  1. Gradient methods
  • Given an empirical risk L(w) = (1/n) ∑_{i=1..n} ℓ_i(w), derive the update rules for: a) Full‑batch gradient descent (GD) b) Stochastic gradient descent (SGD) and mini‑batch SGD
  • Compare convergence properties, gradient variance, and wall‑clock efficiency. Explain when SGD outperforms GD.
  1. Batch size and steps
  • Define batch size.
  • With n = 50,000 samples, epochs = 5: a) For batch size b = 200, compute updates per epoch and total updates. b) For b = 2,000, compute new steps and propose a learning‑rate adjustment via the linear scaling rule. Explain when this rule fails or needs modification.
  1. Supervised vs. unsupervised
  • Classify each algorithm and give one use case: logistic regression, SVM, k‑NN, k‑means, PCA, t‑SNE, Isolation Forest.
  1. Reinforcement learning and policy gradients
  • Relate RL to supervised and unsupervised learning.
  • Write the REINFORCE gradient ∇θ J = E[∑_t ∇θ log πθ(a_t|s_t) G_t]. Show how a baseline b_t keeps the estimator unbiased while reducing variance.
  • For a length‑3 trajectory with returns G = [3, 1, −1] and score‑function terms g1, g2, g3, use a constant baseline b = mean(G) to express the sample gradient.
  1. Deep RL integrations
  • Explain how neural networks are used in RL (e.g., DQN, policy gradient, actor‑critic).
  • For DQN, describe why target networks and experience replay stabilize training, and name a failure mode without them.
  1. Transformers vs. RNNs
  • Contrast parallelism, handling of long‑range dependencies, and complexity.
  • For sequence length n = 1024 and model dimension d = 512, estimate asymptotic time and memory costs of self‑attention. Name two techniques that mitigate quadratic scaling.
  1. Embeddings and polysemy
  • Define embeddings and polysemy.
  • Propose a method to distinguish the word ‘King’ in chess vs. monarchy contexts using contextual encoders or multi‑sense embeddings.
  • Outline one intrinsic evaluation (e.g., word sense disambiguation) and one extrinsic evaluation (e.g., downstream task accuracy).
  1. Low‑compute fine‑tuning plan (7B model, single 24‑GB GPU)
  • Design a low‑compute fine‑tuning approach (e.g., QLoRA or adapters, 4‑bit quantization, gradient checkpointing, mixed precision).
  • Choose a LoRA rank r and specify batch size, sequence length, optimizer, and learning‑rate schedule.
  • Provide a back‑of‑the‑envelope estimate of trainable parameters using hidden size ≈ 4096 and ~32 layers. State assumptions about which projections you adapt.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.