This question evaluates understanding of KV cache mechanisms in Transformer inference, including attention-state caching, memory and latency trade-offs, and engineering optimizations for autoregressive decoding.
In Transformer-based language model inference, what is a key-value (KV) cache?
Explain: