This question evaluates a Data Scientist's ability to choose and interpret evaluation metrics for imbalanced binary classifiers, covering concepts such as precision/recall trade-offs, ROC-AUC versus PR-AUC, capacity-aware metrics and cost-sensitive thresholding, and is categorized in the Machine Learning domain as a practical application requiring conceptual understanding. It is commonly asked because production fraud- and abuse-detection problems demand reasoning about class imbalance, asymmetric business costs of false positives versus false negatives, and limited human-review or enforcement capacity when selecting metrics and operating thresholds.

A sudden spike in daily average comments may be driven by fake users. You are asked to build a binary classifier that flags fake accounts for enforcement.
Which evaluation metrics would you choose for the fake-user classifier and why?
Discuss:
Login required