[Analytics Reasoning] Impact of Malicious Accounts on Meta
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: Onsite
Context & Assumptions
On a large social network (similar to Facebook), only 1% of accounts are malicious or "bad"
Bad accounts send friend requests at a rate ten times higher than good accounts
You have developed a classification model that achieves a 95% true positive rate (TPR) and a 95% true negative rate (TNR)
Questions
1. Single Friend Request Probability
Estimate the probability that a received friend request comes from a bad account.
2. Multiple Friend Requests
Determine the likelihood that, out of five friend requests, at least one originates from a bad account.
3. Model Reliability
If the model flags an account as bad, how likely is it to be truly malicious, given the stated TPR and TNR?
4. Data & Features
Identify which types of data (e.g., behavior logs, friend-request patterns, reported incidents) would be most relevant to classify accounts accurately.
5. Assessing "Bad Account" Prevalence
Propose methods (such as stratified or random sampling) to determine whether the bad-account issue is substantial enough to warrant further intervention.
6. Defining a "Bad User"
Outline what characteristics or behaviors would qualify an account as "bad" (e.g., spammer, scammer, bot).
7. Platform Impact
Discuss how the presence of malicious users could affect the platform's trust, user experience, and reputation.
8. Friend Request Implications
Examine how frequent friend requests from bad accounts might impact legitimate users and overall community health.
Follow-Up Considerations
Threshold Tuning
Discuss how adjusting the classification threshold could alter false positives and false negatives.
Cost-Benefit Analysis
Evaluate the resource investment needed for automated detection vs. manual review.
Long-Term Effects of False Positives
Consider how incorrect labeling of legitimate users might erode trust.
Advanced Feature Engineering
Suggest additional signals (e.g., content quality, login behavior) that could enhance detection accuracy.
Scaling & Fairness
Explain how to maintain model performance and fairness across billions of accounts, ensuring no disproportionate impact on certain demographics.
Quick Answer: This question evaluates a data scientist's competency in probabilistic reasoning, classifier performance interpretation (sensitivity and specificity), sampling and experiment design, feature engineering for fraud detection, and assessment of user-impact from malicious accounts on a social network.