How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Technical Screen rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at OpenAI during technical interviews.

Debug a transformer training pipeline | OpenAI Interview Question

Quick Overview

This question evaluates debugging and troubleshooting skills for deep learning training pipelines, testing understanding of Transformer internals, masking and positional encodings, optimization and numerical stability in PyTorch, data/label alignment, and the ability to create unit tests and assertions for regressions.

Diagnose a Diverging PyTorch Transformer Training Run

You are given a PyTorch Transformer training pipeline whose loss diverges and validation accuracy remains near random. The repository already includes:

Tokenization, padding, and masking
An encoder–decoder Transformer with multi-head attention
AdamW with weight decay, gradient clipping
Cosine LR scheduler with warmup
Mixed precision (torch.cuda.amp.GradScaler)
Teacher forcing in the decoder

Task

Identify four distinct bugs that could plausibly cause divergence or random validation accuracy (e.g., incorrect attention mask construction or dtype/device, off-by-one label shift, missing gradient zeroing, misplaced gradient scaling, wrong positional encoding shape/broadcast, excluding LayerNorm/bias from weight decay incorrectly, data leakage in batching).
For each bug:
- Explain the failure mode (why it breaks learning).
- Show a minimal code fix.
- Provide a unit test or runtime assertion to catch it in the future.
Propose a debugging plan including sanity checks (e.g., copy task), gradient/activation statistics, NaN detection; define which metrics to monitor; and outline a small experiment to verify each fix.

Quick Overview

Diagnose a Diverging PyTorch Transformer Training Run

You are given a PyTorch Transformer training pipeline whose loss diverges and validation accuracy remains near random. The repository already includes:

Tokenization, padding, and masking

An encoder–decoder Transformer with multi-head attention

AdamW with weight decay, gradient clipping

Cosine LR scheduler with warmup

Mixed precision (torch.cuda.amp.GradScaler)

Teacher forcing in the decoder

Task

Identify four distinct bugs that could plausibly cause divergence or random validation accuracy (e.g., incorrect attention mask construction or dtype/device, off-by-one label shift, missing gradient zeroing, misplaced gradient scaling, wrong positional encoding shape/broadcast, excluding LayerNorm/bias from weight decay incorrectly, data leakage in batching).

For each bug:

Explain the failure mode (why it breaks learning).
Show a minimal code fix.
Provide a unit test or runtime assertion to catch it in the future.

Propose a debugging plan including sanity checks (e.g., copy task), gradient/activation statistics, NaN detection; define which metrics to monitor; and outline a small experiment to verify each fix.

Debug a transformer training pipeline

Quick Overview

Diagnose a Diverging PyTorch Transformer Training Run

Task

Solution

Comments (0)

Debug a transformer training pipeline

Quick Overview

Diagnose a Diverging PyTorch Transformer Training Run

Task

Solution

Comments (0)