Detecting Fake Accounts on a Social Network
Context
You are a data scientist at a large social platform. The goal is to detect and mitigate fake or abusive accounts while minimizing harm to legitimate users. Fake accounts are rare compared to legitimate ones, so class imbalance, noisy labels, and high business costs of mistakes are central concerns.
Tasks
-
End-to-end approach: Describe how you would design an end-to-end system to identify fake accounts.
-
Data and labels: Explain how you would obtain training labels and address label noise and class imbalance.
-
Features: Propose key feature families and give examples for each.
-
Modeling and training: Outline model choices, training setup, handling imbalance, and how you’d prevent leakage and drift.
-
Evaluation and prioritization: Specify offline and online evaluation metrics, how you would set thresholds, and how you would prioritize precision vs. recall given business costs.