Statistics Concepts and Disease-Test Evaluation
Context
You are assessing core statistical concepts used in evaluating diagnostic tests and in data science decision-making.
Assume:
-
"Predicts positives with 95% accuracy" refers to sensitivity: P(test+ | infected) = 0.95.
-
"Predicts negatives with 98% accuracy" refers to specificity: P(test− | not infected) = 0.98.
Questions
-
Central Limit Theorem (CLT)
-
Explain the CLT, its prerequisites/assumptions, and situations where it can fail or be unreliable.
-
Bayesian Inference
-
Describe Bayesian inference and why it is widely used in practice.
-
Disease Test Posterior Probability
-
A disease affects 1 in 1,000 people (prevalence = 0.001). A test has sensitivity 95% and specificity 98%.
-
If the test flags someone positive, what is the probability they are truly infected? Show your reasoning. You may use a confusion matrix and Bayes' rule.
Hints
-
Discuss i.i.d. assumptions and sample size for the CLT.
-
Use Bayes’ formula for the posterior probability.
-
Build a confusion matrix and compute the positive predictive value (PPV).