[Analytics Reasoning] Impact of Malicious Accounts on Meta
Meta
Apr 7, 2025, 3:24 AM
Data Scientist
Onsite
Analytics & Experimentation
7
0
Social Network: Friend-Request Risk and Bad-Account Detection
Context and Assumptions
On a large social network (similar to Facebook), 1% of accounts are malicious ("bad").
Bad accounts send friend requests at a rate 10× higher than good accounts.
A binary classifier achieves 95% true positive rate (TPR) and 95% true negative rate (TNR).
Unless otherwise noted, treat incoming friend requests as independent draws from the population of senders over a short time window (stationary behavior).
Questions
Single Friend Request Probability
What is the probability that a received friend request comes from a bad account?
Multiple Friend Requests
What is the probability that, out of five friend requests, at least one originates from a bad account?
Model Reliability
If the model flags an account as bad, how likely is it truly malicious (given TPR = 95% and TNR = 95%)? State assumptions about which population you’re flagging (all accounts vs. request senders).
Data and Features
Which data types (behavior logs, friend-request patterns, reported incidents, etc.) are most relevant to classify accounts accurately?
Assessing Bad-Account Prevalence
Propose methods (e.g., stratified or random sampling) to determine whether the bad-account issue is substantial enough to warrant intervention.
Defining a "Bad User"
What characteristics or behaviors would qualify an account as "bad" (e.g., spammer, scammer, bot)?
Platform Impact
How could malicious users affect platform trust, user experience, and reputation?
Friend Request Implications
How might frequent friend requests from bad accounts impact legitimate users and community health?
Follow-Up Considerations
Threshold Tuning: How does adjusting the decision threshold change false positives and false negatives?
Cost–Benefit Analysis: Compare automated detection vs. manual review.
Long-Term Effects of False Positives: How might mislabeling legitimate users erode trust?
Advanced Feature Engineering: What additional signals (e.g., content quality, login behavior) could improve detection?
Scaling and Fairness: How to maintain performance and fairness at very large scale without disproportionate impacts on subgroups?