Explain LLM post-training methods and tradeoffs
Company: Scale AI
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: easy
Interview Round: Onsite
You are asked about **LLM post-training** (after pretraining on large corpora).
Explain a practical post-training pipeline for turning a base model into an instruction-following assistant.
Cover:
- Supervised fine-tuning (SFT): data types, formatting, and common failure modes.
- Preference optimization approaches: RLHF (reward model + RL) and direct preference optimization (e.g., pairwise preference loss).
- Safety/alignment steps (policy constraints, refusal behavior, red-teaming).
- How you would evaluate quality beyond loss (helpfulness, harmlessness, honesty, regression testing).
- Key tradeoffs: cost, stability, reward hacking, mode collapse, over-refusal, and distribution shift.
You may assume a decoder-only Transformer and conversational data.
Quick Answer: This question evaluates a practitioner's knowledge of LLM post-training methods—including supervised fine-tuning, preference optimization approaches (RLHF and direct preference losses), safety and alignment interventions, and evaluation beyond loss—within the Machine Learning domain.