Identify Fake Accounts Using Machine Learning Techniques

Q: Identify Fake Accounts Using Machine Learning Techniques

This question evaluates a data scientist's competency in designing end-to-end machine learning systems for detecting fake accounts, covering problem framing, feature engineering across signals and time windows, modeling choices (supervised, unsupervised, semi/weakly-supervised, graph-based), evaluation metrics and A/B testing, monitoring for data/concept drift, and quantifying business impact. It is commonly asked to assess the ability to balance precision–recall and the business costs of false positives versus false negatives, and falls under the Machine Learning category with a mix of conceptual understanding and practical application.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Scenario

You are a data scientist at a social‑commerce platform responsible for trust and safety. You need to design a system to detect and mitigate fake accounts (bots, spam, fraud, coordinated inauthentic behavior) while minimizing friction for real users.

Question

Problem framing: How would you define "fake accounts" and decide on actions (auto-block, soft friction, human review)? What are the business costs of false positives vs. false negatives?
Features: What signals would you engineer (signup, device/network, behavior, content, social graph, marketplace/ads) and over what time windows? How would you handle cold start?
Modeling: Would you use supervised, unsupervised, semi/weakly-supervised, or graph-based methods? Describe a classifier architecture you would deploy at scale.
Evaluation: How would you evaluate performance (metrics, splits, PR trade-offs), choose thresholds for different actions, and run A/B tests to measure business impact?
Monitoring and drift: How would you monitor data/concept drift, recalibrate, and retrain? What guardrails would you put in place?
Business impact: What harms do fake users cause across buyers, sellers, and ads ecosystems?
Interpreting prevalence: If 3% of accounts are fake, what conclusions or next steps does that suggest?

Hints: Discuss feature engineering, supervised vs. unsupervised methods, precision–recall trade-offs, and A/B testing for business impact.

Identify Fake Accounts Using Machine Learning Techniques

Scenario

Question

Solution

Comments (0)