Technical Phone Screen: LLM Pipelines, Knowledge Graphs, and RAG
Context
You are designing and operating LLM-based applications that integrate a knowledge graph (KG) and Retrieval-Augmented Generation (RAG). Answer the following to demonstrate both theoretical understanding and production-minded trade-offs.
Questions
-
Cost control for knowledge graphs
-
How would you control the cost of building, storing, updating, and serving a knowledge graph used by an LLM?
-
Measuring LLM accuracy (offline and online)
-
How do you measure the accuracy and quality of LLM outputs offline and online? Include task-specific metrics (e.g., EM/F1 for QA), generation metrics (e.g., BLEU/ROUGE), and production signals.
-
Model comparison
-
Compare RNN, LSTM, and Transformer architectures. Why are Transformers preferred for modern LLMs?
-
Scaled dot-product attention
-
Derive the scaled dot-product attention formula and explain each term and the motivation for scaling.
-
RAG end-to-end workflow
-
Explain the end-to-end workflow of Retrieval-Augmented Generation: ingestion, indexing, retrieval, ranking, prompting, and generation.
-
Reranker role in RAG
-
What is a reranker model and where does it sit in the RAG stack? Discuss trade-offs.
-
Embedding dimensionality and retrieval quality
-
How does embedding vector dimensionality influence retrieval quality, memory, and latency? What are the trade-offs and heuristics for choosing a dimension?
-
LoRA
-
What is LoRA, how does it work, and why is it parameter-efficient? Where is it applied in LLMs?
-
Evaluating a RAG system
-
How would you evaluate the accuracy and groundedness of a RAG system end-to-end? Include retrieval, grounding, and generation metrics, as well as online evaluation.
Hint: Relate theory to production trade-offs: costs, equations, evaluation metrics (BLEU, EM, precision@k), latency and quality trade-offs.