PyTorch Training And Model Implementation
Asked of: Machine Learning Engineer
Last updated
What's being tested
Tests implementation fluency for ML algorithms and `PyTorch` models: tensor shapes, gradients, optimization steps, and clean modular code. Interviewers look for whether you can translate math/model architecture into correct, runnable code while handling edge cases and complexity.
Patterns & templates
-
PyTorch training loop —
`model.train()`, move batch to`device`,`optimizer.zero_grad()`,`loss.backward()`,`optimizer.step()`; track loss without retaining graphs. -
Tensor shape discipline — state shapes at every layer, e.g. transformer input
(B, T, C), attention logits(B, H, T, T); most bugs are silent broadcasting errors. -
Masked self-attention — compute
QK^T / sqrt(d_k), apply causal mask withmasked_fill(..., -inf), thensoftmax; ensure no future-token leakage. -
Residual block template —
x = x + attention(norm(x)), thenx = x + mlp(norm(x)); know pre-norm vs post-norm stability tradeoff. -
Manual SGD derivation — for MSE linear regression, and ; update in-place carefully.
-
K-means loop — assign points to nearest centroid, recompute means, stop on convergence or max iterations; handle empty clusters deterministically.
-
Classic DSA helpers — interval merge via sort by start
O(n log n); top-k frequency via heapO(n log k)or bucket sortO(n).
Common pitfalls
Pitfall: Forgetting
`optimizer.zero_grad()`accumulates gradients across batches and produces misleadingly unstable training.
Pitfall: Building a GPT block without a causal mask turns it into bidirectional attention and invalidates decoder-only behavior.
Pitfall: Explaining algorithms conceptually but not giving tensor dimensions, update equations, or runtime complexity will look shallow.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Featured in interview prep guides
Practice questions
- Implement SGD for linear regression and derive gradientsAmazon · Machine Learning Engineer · Technical Screen · medium
- Implement K-means and solve interval/frequency tasksAmazon · Machine Learning Engineer · Onsite · medium
- Implement PyTorch training loopAmazon · Machine Learning Engineer · Onsite · medium
- Implement decoder-only GPT-style transformerAmazon · Machine Learning Engineer · Onsite · medium
Related concepts
- ML Fundamentals: Backprop, Attention, And RLMachine Learning
- Transformer Training Pipeline DebuggingMachine Learning
- ML Frameworks, Model Compilation, And ParallelismML System Design
- Machine Learning Model Design And EvaluationMachine Learning
- Machine Learning Project LifecycleMachine Learning
- Machine Learning System Design For Real-Time DecisionsMachine Learning