PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/TikTok

How do you choose a classification threshold?

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in binary classification threshold selection, end-to-end ML pipeline design, model evaluation, and operational trade-offs including class imbalance, asymmetric error costs, and limited escalation capacity, within the Machine Learning domain of model evaluation and deployment.

  • easy
  • TikTok
  • Machine Learning
  • Data Scientist

How do you choose a classification threshold?

Company: TikTok

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Technical Screen

## Context You built a **binary sentiment classification** model (e.g., positive vs. negative) and need to deploy it in a product where actions depend on the model’s output. ## Questions 1. **Walk through your ML pipeline** end-to-end: - Data sourcing/labeling and dataset construction (train/validation/test splits). - Feature design or model choice (e.g., TF-IDF + linear model vs. transformer). - Training procedure and evaluation setup. - Key practical challenges (class imbalance, noisy labels, distribution shift) and how you handled them. 2. **Modeling choices:** - Why did you choose method/model **X** over alternatives? - What assumptions does it make, and what trade-offs does it introduce (latency, interpretability, cost, robustness)? 3. **Iterative refinement:** - Describe how you **iteratively improved** the system (e.g., error analysis → new features/data → retrain → re-evaluate). - What were your biggest learnings and what would you do differently next time? 4. **Threshold selection for deployment:** - Your model outputs a probability score. How do you choose the **decision threshold**? - Which metrics would you consider (precision, recall, F1, ROC-AUC, PR-AUC, cost-weighted loss), and which would be **primary vs. diagnostic vs. guardrail**? - How would the answer change under: - Severe class imbalance - Different costs for false positives vs. false negatives - A fixed review/ops capacity (e.g., only 1,000 items/day can be escalated) 5. **Metric definition:** - If a stakeholder proposes defining “success” as metric **XXX**, how do you evaluate whether that definition is appropriate? - What data issues (label leakage, delayed labels, sampling bias) could make the metric misleading?

Quick Answer: This question evaluates a data scientist's competency in binary classification threshold selection, end-to-end ML pipeline design, model evaluation, and operational trade-offs including class imbalance, asymmetric error costs, and limited escalation capacity, within the Machine Learning domain of model evaluation and deployment.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
TikTok logo
TikTok
Nov 8, 2025, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
1
0

Context

You built a binary sentiment classification model (e.g., positive vs. negative) and need to deploy it in a product where actions depend on the model’s output.

Questions

  1. Walk through your ML pipeline end-to-end:
    • Data sourcing/labeling and dataset construction (train/validation/test splits).
    • Feature design or model choice (e.g., TF-IDF + linear model vs. transformer).
    • Training procedure and evaluation setup.
    • Key practical challenges (class imbalance, noisy labels, distribution shift) and how you handled them.
  2. Modeling choices:
    • Why did you choose method/model X over alternatives?
    • What assumptions does it make, and what trade-offs does it introduce (latency, interpretability, cost, robustness)?
  3. Iterative refinement:
    • Describe how you iteratively improved the system (e.g., error analysis → new features/data → retrain → re-evaluate).
    • What were your biggest learnings and what would you do differently next time?
  4. Threshold selection for deployment:
    • Your model outputs a probability score. How do you choose the decision threshold ?
    • Which metrics would you consider (precision, recall, F1, ROC-AUC, PR-AUC, cost-weighted loss), and which would be primary vs. diagnostic vs. guardrail ?
    • How would the answer change under:
      • Severe class imbalance
      • Different costs for false positives vs. false negatives
      • A fixed review/ops capacity (e.g., only 1,000 items/day can be escalated)
  5. Metric definition:
    • If a stakeholder proposes defining “success” as metric XXX , how do you evaluate whether that definition is appropriate?
    • What data issues (label leakage, delayed labels, sampling bias) could make the metric misleading?

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Data Scientist•TikTok Data Scientist•TikTok Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.