Explain LLM fundamentals and trade-offs

Q: Explain LLM fundamentals and trade-offs

This is a Machine Learning interview question from Amazon for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

LLM Fundamentals — Onsite Interview Task

Context: Assume a modern transformer-based LLM. Provide precise, concise explanations with examples and trade-offs.

Subword tokenization (e.g., BPE): How does it work and why is it used?
Self-attention: Explain the mechanism and its O(n^2) cost. Discuss techniques to reduce it (e.g., sparsity, sliding windows, KV cache).
Contrast pretraining, instruction tuning, and RLHF/DPO.
Describe a RAG architecture. Compare indexing choices (BM25 vs dense), chunking strategies, and embeddings. Explain how retrieval quality affects generation.
When do you use prompting vs fine-tuning vs adapters (e.g., LoRA)?
Low-latency inference: Explain quantization, KV caching, and batching.
How do you evaluate LLMs (task-specific metrics, human eval) and mitigate hallucinations and safety risks?

Explain LLM fundamentals and trade-offs

LLM Fundamentals — Onsite Interview Task

Solution (Locked)

Comments (0)