Evaluate RAG System Accuracy and Cost Control Strategies
Company: Amazon
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
##### Scenario
Deep-learning discussion on LLM pipelines, knowledge-graph integration and retrieval-augmented generation.
##### Question
How would you control the cost of maintaining a knowledge graph used by an LLM? How do you measure the accuracy of LLM outputs, both offline and online? Compare Transformer, RNN and LSTM. Why are Transformers preferred for modern LLMs? Derive the scaled dot-product attention formula and explain each term. Explain the end-to-end workflow of Retrieval-Augmented Generation (RAG). What is a reranker model and where does it sit in the RAG stack? How does embedding vector dimensionality influence retrieval quality? What is LoRA, how does it work and why is it parameter-efficient? How would you evaluate the accuracy of a RAG system?
##### Hints
Relate theory to production: costs, equations, eval metrics (BLEU, EM, precision@k), trade-offs.
Quick Answer: This question evaluates understanding of LLM pipelines and Retrieval-Augmented Generation (RAG), including knowledge-graph cost control, retrieval and reranker roles, embedding dimensionality, attention mechanisms and architectures (RNN/LSTM/Transformer), LoRA, and end-to-end accuracy and grounding metrics within Machine Learning, natural language processing, and information retrieval. It is commonly asked to assess theoretical knowledge alongside production-minded trade-offs—scalability, cost, latency, and metric-driven evaluation—testing both conceptual understanding and practical application.