You are asked about LLM post-training (after pretraining on large corpora).
Explain a practical post-training pipeline for turning a base model into an instruction-following assistant.
Cover:
-
Supervised fine-tuning (SFT): data types, formatting, and common failure modes.
-
Preference optimization approaches: RLHF (reward model + RL) and direct preference optimization (e.g., pairwise preference loss).
-
Safety/alignment steps (policy constraints, refusal behavior, red-teaming).
-
How you would evaluate quality beyond loss (helpfulness, harmlessness, honesty, regression testing).
-
Key tradeoffs: cost, stability, reward hacking, mode collapse, over-refusal, and distribution shift.
You may assume a decoder-only Transformer and conversational data.