This question evaluates a candidate's understanding of classifier evaluation metrics (precision, recall, F1, ROC-AUC), cost-sensitive decision making, threshold tuning, and handling class imbalance for production ML systems.

You have trained a model that flags fake accounts. Leadership wants clear, defensible evidence that it works well in production and understands the trade-offs of using it to take actions (e.g., auto-ban vs. human review).
Recommend the evaluation metrics you would use to judge the fake-account classifier and explain why. Discuss the trade-offs among:
Include in your answer:
Login required