Case study: choosing between fine-tuning and RAG for a client chatbot and improving retrieval quality.

When building an LLM application for a client, how would you decide between fine-tuning and Retrieval-Augmented Generation? List and compare fine-tuning methods such as full tuning, instruction tuning, LoRA and embedding fine-tune. Explain LoRA’s mechanism and its inference-time advantages. If retrieved documents show low relevance, how would you improve retrieval quality? The embedding model is the bottleneck; how would you fine-tune it? What data and training procedure are required? How would you architect a chatbot capable of answering questions across multiple knowledge domains?

Compare approaches on cost, data needs, latency; propose iterative retrieval+model tuning and evaluation.

Amazon Data Scientist interview question: Choose Between Fine-Tuning and RAG for Client Chatbot. {"blocks": [{"key": "717b9b3a", "text": "Scenario", "type": "header-two", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {...