Bayesian And Base-Rate Reasoning
Asked of: Data Scientist
Last updated

What's being tested
Ability to apply Bayes' rule to real-world classification and decision problems: combining priors (base rates) with likelihoods (test performance) to compute posterior probabilities and make cost-aware decisions. Also tests recognition of base-rate neglect and appropriate model assumptions.
Core knowledge
- Bayes' rule: P(H|D) = P(D|H)P(H) / [P(D|H)P(H)+P(D|¬H)P(¬H)].
- Prior = base rate/prevalence; likelihood = sensitivity/specificity; posterior = updated belief.
- Sensitivity = P(test+|condition); specificity = P(test−|no condition).
- Positive predictive value (PPV) = P(condition|test+) depends strongly on prevalence.
- False positive paradox: low prevalence → many false positives despite high specificity.
- Quick Bayesian update for binary data: Beta(alpha,beta) prior conjugate to Bernoulli; posterior adds counts.
Worked example — "Medical diagnostic test for a rare disease" (typical framing)
First, list known quantities: disease prevalence (prior), test sensitivity and specificity (likelihood), and what the interviewer asks for (PPV, NPV, or action threshold). Write Bayes' formula and explicitly compute PPV = P(disease|test+). Then discuss decision context: what's the cost of false positive vs false negative, and whether to test again or use sequential testing. If data are available, propose a Beta prior and update counts or simulate posteriors via simple Monte Carlo to reflect uncertainty.
A common pitfall
Picking overall accuracy or sensitivity as the metric for "how good" a test instead of PPV/NPV leads to wrong conclusions when prevalence is low. Interviewees also often neglect uncertainty in the base rate or assume independence of multiple tests without justification. Always translate model outputs to decision-relevant quantities and check assumptions.
Further reading
- Gelman, et al., "Bayesian Data Analysis" (for decision-theoretic framing and hierarchical priors).
- Allen B. Downey, "Think Bayes" (practical tutorials and intuitive worked examples).