Explain Medical AI Data and Evaluation
Company: Oracle
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are discussing a prior project on medical conversational AI. Assume proprietary production data is limited, so you may begin with open-source healthcare dialogue or medical question-answering data.
Give a structured answer to the following:
1. What data would you use, how would you clean it, and how would you split it to avoid leakage?
2. What modeling approach would you choose: prompting a general LLM, fine-tuning a domain model, retrieval-augmented generation (RAG), or a hybrid system? What are the trade-offs?
3. How would you evaluate the system? Include both automatic and human evaluation, and explain why standard text-generation metrics alone are insufficient in healthcare.
4. What risks would you watch for, such as hallucination, unsafe advice, calibration errors, subgroup bias, and distribution shift between open-source data and real clinical use?
5. If the goal were to publish a paper, what baselines, ablations, and statistical checks would you include?
Assume the system supports medical question answering or clinical conversation, and the interviewer wants a high-level but technically rigorous explanation of the data, model, and evaluation choices.
Quick Answer: This question evaluates competence in medical conversational AI covering data sourcing and cleaning, leakage-aware dataset splitting, model choice (prompting, fine-tuning, RAG/hybrid), evaluation design (automatic and human metrics), risk identification (hallucination, unsafe advice, calibration errors, subgroup bias, distribution shift), and experimental rigor including baselines, ablations, and statistical checks. Commonly asked in Machine Learning and Clinical NLP interviews because it probes both conceptual understanding of trade-offs and practical application skills for building safe, reliable medical QA or conversational systems within reproducibility and regulatory-sensitive data handling constraints.