This question evaluates competency in computing and interpreting binary classification metrics—precision, recall, and F1—along with understanding thresholding of confidence scores and handling edge cases like zero denominators.
You are given a list of binary classification outputs, where each record contains:
actual
INT (
0
or
1
)
predict
INT (
0
or
1
)
conf
FLOAT, the model's confidence score for the positive class
Example record: [actual=1, predict=0, conf=0.93].
Write Python or pseudocode to compute the following metrics for the positive class over the full dataset:
precision
recall
F1 score
Also explain:
predict
from
conf
if the interviewer asks you to apply a threshold instead of using the provided predicted labels