Explain Central Limit Theorem and Its Limitations
Statistics Concepts and Disease-Test Evaluation
Context
You are assessing core statistical concepts used in evaluating diagnostic tests and in data science decision-making.
Assume:
-
"Predicts positives with 95% accuracy" refers to sensitivity: P(test+ | infected) = 0.95.
-
"Predicts negatives with 98% accuracy" refers to specificity: P(test− | not infected) = 0.98.
Questions
-
Central Limit Theorem (CLT)
-
Explain the CLT, its prerequisites/assumptions, and situations where it can fail or be unreliable.
-
Bayesian Inference
-
Describe Bayesian inference and why it is widely used in practice.
-
Disease Test Posterior Probability
-
A disease affects 1 in 1,000 people (prevalence = 0.001). A test has sensitivity 95% and specificity 98%.
-
If the test flags someone positive, what is the probability they are truly infected? Show your reasoning. You may use a confusion matrix and Bayes' rule.
Hints
-
Discuss i.i.d. assumptions and sample size for the CLT.
-
Use Bayes’ formula for the posterior probability.
-
Build a confusion matrix and compute the positive predictive value (PPV).
Constraints & Assumptions
-
Preserve the scope, facts, inputs, and requested outputs from the prompt above.
-
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
-
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
Clarifying Questions to Ask
-
Clarify the random variables, distributional assumptions, independence assumptions, and desired output.
-
Show enough derivation for the interviewer to follow the reasoning.
-
Explain how you would validate the result with simulation or sensitivity checks.
What a Strong Answer Covers
-
A correct setup with definitions, formulas, and boundary conditions.
-
A step-by-step derivation or estimation plan.
-
Interpretation of the result, including uncertainty and practical limitations.
-
Checks for assumptions, edge cases, and numerical stability.
Follow-up Questions
-
How would the result change if the assumptions were relaxed?
-
Can you verify the answer with a simulation?
-
What is the most likely source of estimation error?