PyTorch Training And Model Implementation

What's being tested

Tests implementation fluency for ML algorithms and `PyTorch` models: tensor shapes, gradients, optimization steps, and clean modular code. Interviewers look for whether you can translate math/model architecture into correct, runnable code while handling edge cases and complexity.

Patterns & templates

PyTorch training loop — `model.train()`, move batch to `device`, `optimizer.zero_grad()`, `loss.backward()`, `optimizer.step()`; track loss without retaining graphs.
Tensor shape discipline — state shapes at every layer, e.g. transformer input (B, T, C), attention logits (B, H, T, T); most bugs are silent broadcasting errors.
Masked self-attention — compute QK^T / sqrt(d_k), apply causal mask with masked_fill(..., -inf), then softmax; ensure no future-token leakage.
Residual block template — x = x + attention(norm(x)), then x = x + mlp(norm(x)); know pre-norm vs post-norm stability tradeoff.
Manual SGD derivation — for MSE linear regression, $\nabla_w = \frac{2}{n}X^T(Xw-y)$ and $\nabla_b = \frac{2}{n}\sum(\hat y-y)$ ; update in-place carefully.
K-means loop — assign points to nearest centroid, recompute means, stop on convergence or max iterations; handle empty clusters deterministically.
Classic DSA helpers — interval merge via sort by start O(n log n); top-k frequency via heap O(n log k) or bucket sort O(n).

Common pitfalls

Pitfall: Forgetting `optimizer.zero_grad()` accumulates gradients across batches and produces misleadingly unstable training.

Pitfall: Building a GPT block without a causal mask turns it into bidirectional attention and invalidates decoder-only behavior.

Pitfall: Explaining algorithms conceptually but not giving tensor dimensions, update equations, or runtime complexity will look shallow.

Practice these

The practice cards below cover the canonical variants — solve all of them and time yourself.

What's being tested

Patterns & templates

Common pitfalls

Practice these

Featured in interview prep guides

Practice questions

Related concepts