{"blocks": [{"key": "1cb99009", "text": "Scenario", "type": "header-two", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "44315d60", "text": "Evaluating a customer-service chatbot: P(honest answer)=0.7, P(relevant answer)=0.8.", "type": "unstyled", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "6f780bfc", "text": "Question", "type": "header-two", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "8289d5c3", "text": "What is the probability that an answer is both honest and relevant if the two events are independent? Given logs of 1,000 answers, how many would you expect to be neither honest nor relevant? Describe how you would run a hypothesis test to compare two LLMs’ relevance rates at α=0.05.", "type": "unstyled", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "fd35c54f", "text": "Hints", "type": "header-two", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}, {"key": "a708c349", "text": "Basic probability rules; two-proportion z-test setup and interpretation.", "type": "unstyled", "depth": 0, "inlineStyleRanges": [], "entityRanges": [], "data": {}}], "entityMap": {}}