This question evaluates understanding of binary classification evaluation metrics (precision, recall, F1) and the ability to compute confusion matrix counts from actual and predicted labels, situated in the Statistics & Math domain for Data Scientist roles.
You are given a list of binary classification outputs such as [{"actual": 1, "predicted": 0, "confidence": 0.93}, ...], where class 1 is the positive class. Using the provided actual and predicted labels, write Python or pseudocode to compute the confusion-matrix counts (TP, FP, TN, FN) and then calculate:
precision = TP / (TP + FP)
recall = TP / (TP + FN)
F1 = 2 * precision * recall / (precision + recall)
If the interviewer instead asks you to derive predicted from confidence, assume a decision threshold and explain how the metrics change when the threshold changes. Also describe how you would handle edge cases such as no predicted positives or no actual positives.