Fake-Account Detection with Binomial Sessions and Bayes Updating
You are evaluating a rules-based detector for fake accounts on an online platform. Each account had n = 5 independent sessions last week. In each session, a "suspicious action" happens with probability p_F = 0.5 if the account is fake and p_A = 0.05 if authentic. The detector flags an account if it has at least k suspicious sessions. The prior fake rate is 3%.
Assumptions:
-
Sessions are independent given account type (fake vs authentic).
-
In part (c), the manual reviewer’s decision is independent of the rule conditional on the true label and is only applied to flagged accounts.
Answer the following:
(a) For k = 2, compute TPR = P(flag | fake) and FPR = P(flag | authentic) using the Binomial distribution. Show formulas and numeric values.
(b) Using Bayes’ Theorem, compute PPV = P(fake | flag) and NPV = P(authentic | not flagged) for k = 2.
(c) Now a manual review is applied only to flagged accounts. The reviewer independently has sensitivity 0.90 and specificity 0.98. An account is actioned only if both the rule flags it and the reviewer says “fake.” Compute the new overall TPR and FPR, and the revised PPV.
(d) For a population of 1,000,000 accounts, compute expected counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) under the process in (c).
(e) For k ∈ {1, 2, 3, 4, 5}, which k maximizes the F1 score on the prior above without the manual review step? Outline the computation and provide the numeric choice. Discuss how the optimal k would change if the base fake rate rose to 10%.
(f) Identify which errors in (a)–(e) correspond to Type I vs. Type II errors in this context.