This question evaluates understanding of Bernoulli trials and independence along with proficiency in comparative hypothesis testing for proportions, relevant to the Statistics & Math domain and typical for a Data Scientist role.

You are evaluating chatbot/LLM responses. Treat each response as a Bernoulli trial (good vs not good). Unless otherwise noted, assume independence across responses.
Login required