Estimating Fake Accounts on a Social Network
Background
A large social platform wants to estimate the proportion and absolute count of fake accounts on the service. "Fake" includes accounts that are inauthentic (bots, impersonations, coordinated inauthentic behavior), excluding clearly legitimate users. Estimates should be produced for a defined time window (e.g., monthly) and a defined population (e.g., all accounts, or active accounts in the last 30 days).
Task
Design an analytics approach to estimate the prevalence (percentage) and count of fake accounts on Facebook. Specify:
-
Data signals/features you would use.
-
A sampling strategy for labeling and estimation.
-
A modeling approach to classify fakes and estimate prevalence.
-
A validation plan and how you would compute confidence intervals.
Hints: Consider random/stratified sampling, supervised classification with manual labeling, capture–recapture, and confidence intervals.