Scenario
You are a data scientist at a social‑commerce platform responsible for trust and safety. You need to design a system to detect and mitigate fake accounts (bots, spam, fraud, coordinated inauthentic behavior) while minimizing friction for real users.
Question
-
Problem framing: How would you define "fake accounts" and decide on actions (auto-block, soft friction, human review)? What are the business costs of false positives vs. false negatives?
-
Features: What signals would you engineer (signup, device/network, behavior, content, social graph, marketplace/ads) and over what time windows? How would you handle cold start?
-
Modeling: Would you use supervised, unsupervised, semi/weakly-supervised, or graph-based methods? Describe a classifier architecture you would deploy at scale.
-
Evaluation: How would you evaluate performance (metrics, splits, PR trade-offs), choose thresholds for different actions, and run A/B tests to measure business impact?
-
Monitoring and drift: How would you monitor data/concept drift, recalibrate, and retrain? What guardrails would you put in place?
-
Business impact: What harms do fake users cause across buyers, sellers, and ads ecosystems?
-
Interpreting prevalence: If 3% of accounts are fake, what conclusions or next steps does that suggest?
Hints: Discuss feature engineering, supervised vs. unsupervised methods, precision–recall trade-offs, and A/B testing for business impact.