Answer the following LLM-focused questions.
1) Transformer basics
-
What problem does the
Transformer
architecture solve compared with RNNs?
-
Explain the main components:
-
token embeddings and positional information
-
self-attention (including what "Q/K/V" are)
-
multi-head attention
-
feed-forward network, residual connections, layer norm
-
What is the computational complexity of full self-attention with respect to sequence length
L
?
2) Real-world LLM deployment
You are asked to deploy an LLM-powered feature (e.g., internal assistant or customer support bot).
-
List the main real-world challenges (latency, cost, quality, safety, privacy, etc.).
-
Propose a deployment architecture and concrete mitigations for those challenges.
-
Describe how you would evaluate the system offline and monitor it online after launch.