Explain LLM fine-tuning and generative models
Company: Google
Role: Software Engineer
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
## ML Fundamentals — LLM & Generative AI Track
You are interviewing for an ML-focused engineering role. After a short résumé walkthrough, the interviewer dives into ML fundamentals on the **LLM / Generative AI** track. The conversation has two parts: adapting pretrained LLMs to downstream tasks, and comparing a family of latent-variable generative models.
This is a conceptual interview — there is no coding required. You are expected to reason out loud, organize your answer into a clear taxonomy, and back each choice with concrete trade-offs.
### Constraints & Assumptions
- Treat this as a whiteboard / discussion question — depth of reasoning and clear trade-off articulation matter more than memorized facts.
- Assume access to a strong open-weight base model (e.g., a decoder-only transformer) plus typical industry training infrastructure (multi-GPU nodes, an experiment tracker, an eval harness).
- "Downstream task" can mean instruction following, domain Q&A, structured extraction, classification, or stylistic adaptation — keep your taxonomy general but call out where a method shines.
- For Part B, focus on the latent-variable formulation and training objective rather than any single published architecture.
### Clarifying Questions to Ask
A strong candidate scopes the problem before answering. Reasonable questions up front:
- For the fine-tuning task: how much labeled (or preference) data is available, and how clean is it?
- What are the deployment constraints — latency budget, cost per request, on-device vs. server, number of concurrent task variants?
- Does the task require up-to-date / frequently-changing knowledge, or is it stable?
- Are there safety, alignment, or compliance requirements (refusals, toxicity, PII handling)?
- For the generative-model comparison: what's the target modality (images, audio, discrete tokens) and the primary goal — representation learning, sampling quality, or compression?
### Part A — Adapting / Fine-tuning a Pretrained LLM
Walk through the common ways to adapt a pretrained LLM to a downstream task. For **each** approach, explain (1) how it works mechanically, (2) its main pros and cons, and (3) when you would choose it. Then reason through how your recommendation changes under these practical scenarios:
- **Limited labeled data.**
- **Strict latency / cost constraints.**
- **Domain adaptation without forgetting general capabilities** (catastrophic forgetting).
- **Safety / alignment requirements.**
```hint Organize by how much you change the model
There is a natural spectrum from touching nothing to rewriting everything. Structuring your taxonomy around that spectrum — rather than jumping straight to examples — signals clarity of thought and makes the trade-offs easier to compare systematically.
```
```hint Comparison axes keep the answer crisp
Pick a small, consistent set of axes and score every method against them, so the interviewer can follow the comparison without tracking scattered facts. Think about what practitioners actually have to budget or worry about when deploying an adapted model.
```
```hint Each scenario maps to a specific underlying reason
For each scenario, ask *why* it constrains the choice — and let that reason point you to the right class of mechanism. The four scenarios each stress a different dimension of the spectrum.
```
#### What This Part Should Cover
- A structured taxonomy that spans the full range of adaptation options (from no parameter updates through preference-based alignment), with the organizing principle made explicit.
- Accurate mechanics, honest trade-offs on consistent axes, and a clear decision rule per scenario — not just naming methods.
- Awareness of catastrophic forgetting and how to detect and mitigate it; mention of an evaluation strategy across the adaptation lifecycle.
- Correct mechanics for PEFT variants (distinguishing them from alignment methods) and a clear account of when RAG and fine-tuning are complementary rather than competing.
### Part B — Latent-Variable Generative Models: AE vs. VAE vs. VQ-VAE
Explain and compare the **Autoencoder (AE)**, the **Variational Autoencoder (VAE)**, and the **Vector-Quantized VAE (VQ-VAE)**. For each, cover the training objective, what happens during training, typical failure modes, and common use cases — and be precise about *why* a plain AE is not a proper generative model whereas a VAE is.
```hint What makes something "generative"
The distinction between a model that compresses-and-reconstructs and one that can generate new samples hinges on what the latent space looks like and whether you can meaningfully sample from it. Think about what property the latent space needs to have, and how the training objective creates (or fails to create) that property.
```
```hint Each model has a signature loss structure
The three models differ most starkly in their training objectives — specifically in whether the loss is purely reconstruction-focused or adds terms that impose structure on the latent. Work out the loss for each model before reasoning about its failure modes and use cases.
```
```hint Failure modes are a discriminator
Each model has a characteristic way it breaks down at training time, and each failure mode is a direct consequence of how that model's objective is set up. Naming the failure mode *and* tracing it back to the objective signals genuine understanding.
```
#### What This Part Should Cover
- **AE:** whether and why the AE qualifies as a generative model; what the reconstruction-only objective does and does not guarantee about the latent space; when an AE is still useful despite this limitation.
- **VAE:** the probabilistic formulation, how the training objective balances reconstruction and regularization, what makes the reparameterization trick necessary, and how each failure mode traces back to the objective.
- **VQ-VAE:** why a discrete codebook is used, how the three-term loss handles the non-differentiability of nearest-neighbor lookup, and how codebook failure manifests.
- A succinct side-by-side contrast on continuous vs. discrete latent, sample quality, and training stability.
### Follow-up Questions
- When would you choose **RAG over fine-tuning** (and vice versa), and how would you evaluate a RAG system end-to-end?
- For a multi-tenant product needing **dozens of task-specific variants** on a tight budget, which adaptation strategy do you pick and why?
- A VAE trained on images produces **blurry, low-diversity samples**. Diagnose the likely causes and list concrete fixes.
- How does VQ-VAE relate to using **discrete tokens as input to a downstream autoregressive (transformer) generator**, and why is that combination powerful?
Quick Answer: This question evaluates understanding of LLM adaptation techniques and trade-offs (fine-tuning and parameter-efficient methods) alongside knowledge of generative model families (AE, VAE, VQ‑VAE), covering objectives, training behavior, typical failure modes, and considerations like limited data, latency/cost, domain adaptation, and safety.