PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Amazon

Explain NLP/RL concepts used in LLM agents

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in transformer-based NLP, embedding methods, LLM agent architecture and evaluation, retrieval techniques for RAG, and reinforcement learning fundamentals, testing understanding of model families, static vs contextual embeddings, agent components and metrics, lexical vs dense retrieval, BM25 concepts, and on/off-policy Q-learning. It is commonly asked to assess an applied Machine Learning engineer's ability to reason about trade-offs and design choices across Machine Learning, Natural Language Processing, Information Retrieval, and Reinforcement Learning, emphasizing both conceptual understanding and practical application.

  • hard
  • Amazon
  • Machine Learning
  • Machine Learning Engineer

Explain NLP/RL concepts used in LLM agents

Company: Amazon

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You are interviewing for an applied ML role focused on LLM agents and retrieval-augmented generation (RAG). Answer the following conceptual questions clearly and with examples: ## Transformer/NLP foundations 1. **Encoder-only vs encoder–decoder vs decoder-only** architectures: - What are the key differences in objective, attention pattern, and typical use-cases? - Give representative model families for each. - For tasks like classification, translation, and open-ended generation, which would you choose and why? 2. **Word2Vec**: - Explain how Word2Vec learns embeddings (CBOW/Skip-gram; negative sampling or hierarchical softmax). - Contrast **static embeddings** (e.g., Word2Vec) with **contextual embeddings** (e.g., Transformer-based). When does each fail? ## LLM agents 3. **LLM-as-a-judge / LLM-based evaluation**: - How would you use an LLM to evaluate agent outputs? - What failure modes (bias, verbosity preference, prompt sensitivity, leakage) and mitigations would you consider? - What metrics would you report for agent quality (task success, tool-use correctness, groundedness, etc.)? 4. **ReAct**: - Explain how the ReAct paradigm works at a high level. - Why can interleaving reasoning + actions help compared to pure “think then answer”? 5. **Agent vs LLM**: - What is the fundamental difference between an “agent” and a standalone LLM? - Name and explain common **agent components** (e.g., goal, planner, tool interface, memory, policy/executor, evaluator). ## Retrieval for RAG 6. **Lexical (sparse) vs dense retrieval**: - Define lexical-based retrieval and dense-based retrieval. - Compare tradeoffs (latency, interpretability, domain shift, exact match vs semantic match). - When would you use hybrid retrieval? 7. **BM25**: - Explain how BM25 scoring works conceptually (TF saturation, IDF, length normalization). - What are typical knobs/hyperparameters and practical pitfalls? ## Reinforcement learning basics (as used around LLMs) 8. **On-policy vs off-policy**: - Define both, and give examples. - Why does the distinction matter for stability and sample efficiency? 9. **Q-learning**: - What is the Q-function and the Bellman optimality equation? - Describe the Q-learning update rule and why it is considered off-policy.

Quick Answer: This question evaluates proficiency in transformer-based NLP, embedding methods, LLM agent architecture and evaluation, retrieval techniques for RAG, and reinforcement learning fundamentals, testing understanding of model families, static vs contextual embeddings, agent components and metrics, lexical vs dense retrieval, BM25 concepts, and on/off-policy Q-learning. It is commonly asked to assess an applied Machine Learning engineer's ability to reason about trade-offs and design choices across Machine Learning, Natural Language Processing, Information Retrieval, and Reinforcement Learning, emphasizing both conceptual understanding and practical application.

Related Interview Questions

  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Design and evaluate a RAG system - Amazon (easy)
  • Design a search relevance prediction approach - Amazon (medium)
Amazon logo
Amazon
Feb 9, 2026, 12:00 AM
Machine Learning Engineer
Onsite
Machine Learning
14
0
Loading...

You are interviewing for an applied ML role focused on LLM agents and retrieval-augmented generation (RAG). Answer the following conceptual questions clearly and with examples:

Transformer/NLP foundations

  1. Encoder-only vs encoder–decoder vs decoder-only architectures:
    • What are the key differences in objective, attention pattern, and typical use-cases?
    • Give representative model families for each.
    • For tasks like classification, translation, and open-ended generation, which would you choose and why?
  2. Word2Vec :
    • Explain how Word2Vec learns embeddings (CBOW/Skip-gram; negative sampling or hierarchical softmax).
    • Contrast static embeddings (e.g., Word2Vec) with contextual embeddings (e.g., Transformer-based). When does each fail?

LLM agents

  1. LLM-as-a-judge / LLM-based evaluation :
    • How would you use an LLM to evaluate agent outputs?
    • What failure modes (bias, verbosity preference, prompt sensitivity, leakage) and mitigations would you consider?
    • What metrics would you report for agent quality (task success, tool-use correctness, groundedness, etc.)?
  2. ReAct :
    • Explain how the ReAct paradigm works at a high level.
    • Why can interleaving reasoning + actions help compared to pure “think then answer”?
  3. Agent vs LLM :
    • What is the fundamental difference between an “agent” and a standalone LLM?
    • Name and explain common agent components (e.g., goal, planner, tool interface, memory, policy/executor, evaluator).

Retrieval for RAG

  1. Lexical (sparse) vs dense retrieval :
    • Define lexical-based retrieval and dense-based retrieval.
    • Compare tradeoffs (latency, interpretability, domain shift, exact match vs semantic match).
    • When would you use hybrid retrieval?
  2. BM25 :
    • Explain how BM25 scoring works conceptually (TF saturation, IDF, length normalization).
    • What are typical knobs/hyperparameters and practical pitfalls?

Reinforcement learning basics (as used around LLMs)

  1. On-policy vs off-policy :
    • Define both, and give examples.
    • Why does the distinction matter for stability and sample efficiency?
  2. Q-learning :
    • What is the Q-function and the Bellman optimality equation?
    • Describe the Q-learning update rule and why it is considered off-policy.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.