This question evaluates a candidate's competence in probability and statistical inference, covering binomial probability calculations, Bayes' theorem for posterior probability, rare-event base-rate reasoning, and modeling of correlated binary signals in a fraud-screening context.

You are designing a rule-based screener that flags an account if at least k of 5 binary signals fire. Signals behave differently for fake vs. authentic accounts:
Let S be the number of signals that fire for an account (S ~ Binomial(n=5, p) under independence). An account is flagged if S ≥ k.
Tasks (a) For k = 2, compute P(flagged | fake) and P(flagged | authentic) using the binomial distribution.
(b) Assume a base rate P(fake) = 1.5%. Compute P(fake | flagged) via Bayes' theorem, and the expected number of flagged accounts in a day with 5,000,000 accounts scanned. Is a manual review queue of 80,000 per day sufficient?
(c) Find the smallest k such that expected flagged volume fits within 80,000 ± 5% (i.e., 76,000–84,000 per day) while maximizing P(fake | flagged). Show work and justify the precision–recall trade-off.
(d) If signals are not independent and have equal within-class pairwise correlation ρ = 0.2, explain how your answers and assumptions change. Provide a reasonable way to model this dependence and illustrate its impact on volume and precision.
Login required