This question evaluates a candidate's competency in designing analytics solutions to estimate the prevalence and absolute count of fake accounts using data signals, sampling strategies, statistical classification, and validation techniques.

A large social platform wants to estimate the proportion and absolute count of fake accounts on the service. "Fake" includes accounts that are inauthentic (bots, impersonations, coordinated inauthentic behavior), excluding clearly legitimate users. Estimates should be produced for a defined time window (e.g., monthly) and a defined population (e.g., all accounts, or active accounts in the last 30 days).
Design an analytics approach to estimate the prevalence (percentage) and count of fake accounts on Facebook. Specify:
Hints: Consider random/stratified sampling, supervised classification with manual labeling, capture–recapture, and confidence intervals.
Login required