How do you choose a classification threshold?

Q: How do you choose a classification threshold?

This is a Machine Learning interview question from ByteDance for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Context

You built a binary sentiment classification model (e.g., positive vs. negative) and need to deploy it in a product where actions depend on the model’s output.

Questions

Walk through your ML pipeline end-to-end:
- Data sourcing/labeling and dataset construction (train/validation/test splits).
- Feature design or model choice (e.g., TF-IDF + linear model vs. transformer).
- Training procedure and evaluation setup.
- Key practical challenges (class imbalance, noisy labels, distribution shift) and how you handled them.
Modeling choices:
- Why did you choose method/model X over alternatives?
- What assumptions does it make, and what trade-offs does it introduce (latency, interpretability, cost, robustness)?
Iterative refinement:
- Describe how you iteratively improved the system (e.g., error analysis → new features/data → retrain → re-evaluate).
- What were your biggest learnings and what would you do differently next time?
Threshold selection for deployment:
- Your model outputs a probability score. How do you choose the decision threshold ?
- Which metrics would you consider (precision, recall, F1, ROC-AUC, PR-AUC, cost-weighted loss), and which would be primary vs. diagnostic vs. guardrail ?
- How would the answer change under:
  - Severe class imbalance
  - Different costs for false positives vs. false negatives
  - A fixed review/ops capacity (e.g., only 1,000 items/day can be escalated)
Metric definition:
- If a stakeholder proposes defining “success” as metric XXX , how do you evaluate whether that definition is appropriate?
- What data issues (label leakage, delayed labels, sampling bias) could make the metric misleading?

How do you choose a classification threshold?

Context

Questions

Solution

Comments (0)