
Context: 1% of accounts are bad. Bad accounts send friend requests at 10× the rate of good accounts.
We build a classifier to detect bad accounts. It achieves a true positive rate (TPR) of 95% and a true negative rate (TNR) of 95%.
What types of data would you use to determine whether an account should be classified as a bad or good account?
How would you determine whether bad accounts pose a significant issue to the platform? Would you use stratified sampling, random sampling, or another approach?
How would you define a “bad user” in the context of a social media platform?
What are the potential impacts of fraudulent or bad users on the platform and its community?
What potential effects might arise from friend requests initiated by bad accounts?
When building a machine learning model to identify bad accounts, how would you approach the tradeoff between precision and recall? In which situations would you prioritize one over the other?
Login required