Estimate Fake Accounts Using Data Signals and Sampling

Q: Estimate Fake Accounts Using Data Signals and Sampling

This question evaluates a candidate's competency in designing analytics solutions to estimate the prevalence and absolute count of fake accounts using data signals, sampling strategies, statistical classification, and validation techniques.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

Background

A large social platform wants to estimate the proportion and absolute count of fake accounts on the service. "Fake" includes accounts that are inauthentic (bots, impersonations, coordinated inauthentic behavior), excluding clearly legitimate users. Estimates should be produced for a defined time window (e.g., monthly) and a defined population (e.g., all accounts, or active accounts in the last 30 days).

Task

Design an analytics approach to estimate the prevalence (percentage) and count of fake accounts on Facebook. Specify:

Data signals/features you would use.
A sampling strategy for labeling and estimation.
A modeling approach to classify fakes and estimate prevalence.
A validation plan and how you would compute confidence intervals.

Hints: Consider random/stratified sampling, supervised classification with manual labeling, capture–recapture, and confidence intervals.

Estimate Fake Accounts Using Data Signals and Sampling

Background

Task

Solution

Comments (0)

Estimate Fake Accounts Using Data Signals and Sampling

Overview

Estimating Fake Accounts on a Social Network

Background

Task

Solution

Comments (0)