You are interviewing for an applied ML role focused on LLM agents and retrieval-augmented generation (RAG). Answer the following conceptual questions clearly and with examples:
Transformer/NLP foundations
-
Encoder-only vs encoder–decoder vs decoder-only
architectures:
-
What are the key differences in objective, attention pattern, and typical use-cases?
-
Give representative model families for each.
-
For tasks like classification, translation, and open-ended generation, which would you choose and why?
-
Word2Vec
:
-
Explain how Word2Vec learns embeddings (CBOW/Skip-gram; negative sampling or hierarchical softmax).
-
Contrast
static embeddings
(e.g., Word2Vec) with
contextual embeddings
(e.g., Transformer-based). When does each fail?
LLM agents
-
LLM-as-a-judge / LLM-based evaluation
:
-
How would you use an LLM to evaluate agent outputs?
-
What failure modes (bias, verbosity preference, prompt sensitivity, leakage) and mitigations would you consider?
-
What metrics would you report for agent quality (task success, tool-use correctness, groundedness, etc.)?
-
ReAct
:
-
Explain how the ReAct paradigm works at a high level.
-
Why can interleaving reasoning + actions help compared to pure “think then answer”?
-
Agent vs LLM
:
-
What is the fundamental difference between an “agent” and a standalone LLM?
-
Name and explain common
agent components
(e.g., goal, planner, tool interface, memory, policy/executor, evaluator).
Retrieval for RAG
-
Lexical (sparse) vs dense retrieval
:
-
Define lexical-based retrieval and dense-based retrieval.
-
Compare tradeoffs (latency, interpretability, domain shift, exact match vs semantic match).
-
When would you use hybrid retrieval?
-
BM25
:
-
Explain how BM25 scoring works conceptually (TF saturation, IDF, length normalization).
-
What are typical knobs/hyperparameters and practical pitfalls?
Reinforcement learning basics (as used around LLMs)
-
On-policy vs off-policy
:
-
Define both, and give examples.
-
Why does the distinction matter for stability and sample efficiency?
-
Q-learning
:
-
What is the Q-function and the Bellman optimality equation?
-
Describe the Q-learning update rule and why it is considered off-policy.