This question evaluates a candidate's understanding of probabilistic modeling and statistical decision-making, focusing on the Binomial distribution for session-level events and Bayes' theorem for posterior probabilities in a fraud-detection setting.

You are evaluating a rules-based detector for fake accounts on an online platform. Each account had n = 5 independent sessions last week. In each session, a "suspicious action" happens with probability p_F = 0.5 if the account is fake and p_A = 0.05 if authentic. The detector flags an account if it has at least k suspicious sessions. The prior fake rate is 3%.
Assumptions:
Answer the following:
(a) For k = 2, compute TPR = P(flag | fake) and FPR = P(flag | authentic) using the Binomial distribution. Show formulas and numeric values.
(b) Using Bayes’ Theorem, compute PPV = P(fake | flag) and NPV = P(authentic | not flagged) for k = 2.
(c) Now a manual review is applied only to flagged accounts. The reviewer independently has sensitivity 0.90 and specificity 0.98. An account is actioned only if both the rule flags it and the reviewer says “fake.” Compute the new overall TPR and FPR, and the revised PPV.
(d) For a population of 1,000,000 accounts, compute expected counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) under the process in (c).
(e) For k ∈ {1, 2, 3, 4, 5}, which k maximizes the F1 score on the prior above without the manual review step? Outline the computation and provide the numeric choice. Discuss how the optimal k would change if the base fake rate rose to 10%.
(f) Identify which errors in (a)–(e) correspond to Type I vs. Type II errors in this context.
Login required