This question evaluates a practitioner's knowledge of LLM post-training methods—including supervised fine-tuning, preference optimization approaches (RLHF and direct preference losses), safety and alignment interventions, and evaluation beyond loss—within the Machine Learning domain.
You are asked about LLM post-training (after pretraining on large corpora).
Explain a practical post-training pipeline for turning a base model into an instruction-following assistant.
Cover:
You may assume a decoder-only Transformer and conversational data.