PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/TikTok

How do you choose a classification threshold?

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in binary classification threshold selection, end-to-end ML pipeline design, model evaluation, and operational trade-offs including class imbalance, asymmetric error costs, and limited escalation capacity, within the Machine Learning domain of model evaluation and deployment.

  • easy
  • TikTok
  • Machine Learning
  • Data Scientist

How do you choose a classification threshold?

Company: TikTok

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Technical Screen

## Context You built a **binary sentiment classification** model (e.g., positive vs. negative) and need to deploy it in a product where actions depend on the model’s output. ## Questions 1. **Walk through your ML pipeline** end-to-end: - Data sourcing/labeling and dataset construction (train/validation/test splits). - Feature design or model choice (e.g., TF-IDF + linear model vs. transformer). - Training procedure and evaluation setup. - Key practical challenges (class imbalance, noisy labels, distribution shift) and how you handled them. 2. **Modeling choices:** - Why did you choose method/model **X** over alternatives? - What assumptions does it make, and what trade-offs does it introduce (latency, interpretability, cost, robustness)? 3. **Iterative refinement:** - Describe how you **iteratively improved** the system (e.g., error analysis → new features/data → retrain → re-evaluate). - What were your biggest learnings and what would you do differently next time? 4. **Threshold selection for deployment:** - Your model outputs a probability score. How do you choose the **decision threshold**? - Which metrics would you consider (precision, recall, F1, ROC-AUC, PR-AUC, cost-weighted loss), and which would be **primary vs. diagnostic vs. guardrail**? - How would the answer change under: - Severe class imbalance - Different costs for false positives vs. false negatives - A fixed review/ops capacity (e.g., only 1,000 items/day can be escalated) 5. **Metric definition:** - If a stakeholder proposes defining “success” as metric **XXX**, how do you evaluate whether that definition is appropriate? - What data issues (label leakage, delayed labels, sampling bias) could make the metric misleading?

Quick Answer: This question evaluates a data scientist's competency in binary classification threshold selection, end-to-end ML pipeline design, model evaluation, and operational trade-offs including class imbalance, asymmetric error costs, and limited escalation capacity, within the Machine Learning domain of model evaluation and deployment.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
|Home/Machine Learning/TikTok

How do you choose a classification threshold?

TikTok logo
TikTok
Nov 8, 2025, 12:00 AM
easyData ScientistTechnical ScreenMachine Learning
2
0

Context

You built a binary sentiment classification model (e.g., positive vs. negative) and need to deploy it in a product where actions depend on the model’s output.

Questions

  1. Walk through your ML pipeline end-to-end:
    • Data sourcing/labeling and dataset construction (train/validation/test splits).
    • Feature design or model choice (e.g., TF-IDF + linear model vs. transformer).
    • Training procedure and evaluation setup.
    • Key practical challenges (class imbalance, noisy labels, distribution shift) and how you handled them.
  2. Modeling choices:
    • Why did you choose method/model X over alternatives?
    • What assumptions does it make, and what trade-offs does it introduce (latency, interpretability, cost, robustness)?
  3. Iterative refinement:
    • Describe how you iteratively improved the system (e.g., error analysis → new features/data → retrain → re-evaluate).
    • What were your biggest learnings and what would you do differently next time?
  4. Threshold selection for deployment:
    • Your model outputs a probability score. How do you choose the decision threshold ?
    • Which metrics would you consider (precision, recall, F1, ROC-AUC, PR-AUC, cost-weighted loss), and which would be primary vs. diagnostic vs. guardrail ?
    • How would the answer change under:
      • Severe class imbalance
      • Different costs for false positives vs. false negatives
      • A fixed review/ops capacity (e.g., only 1,000 items/day can be escalated)
  5. Metric definition:
    • If a stakeholder proposes defining “success” as metric XXX , how do you evaluate whether that definition is appropriate?
    • What data issues (label leakage, delayed labels, sampling bias) could make the metric misleading?
Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Data Scientist•TikTok Data Scientist•TikTok Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.