PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/TikTok

Answer ML fundamentals and diagnostics questions

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency with confusion-matrix metrics (recall and false positive rate), ensemble learning trade-offs, decision-tree split/impurity criteria, training-loss and learning-curve diagnostics, regularization effects, and comparative training-speed considerations for Random Forest versus gradient-boosted models.

  • hard
  • TikTok
  • Machine Learning
  • Machine Learning Engineer

Answer ML fundamentals and diagnostics questions

Company: TikTok

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Take-home Project

You are taking a timed online assessment with multiple-select and numeric-response questions. ## 1) Confusion-matrix metrics (multiple select) A binary classifier is evaluated on 200 examples. For each option below, the confusion matrix is given as `(TP, FP, FN, TN)`. Select all options where: - **Recall** \(= \frac{TP}{TP+FN}\) is **> 0.90**, and - **False Positive Rate (FPR)** \(= \frac{FP}{FP+TN}\) is **< 0.10**. Options: - **A:** (95, 8, 5, 92) - **B:** (92, 12, 8, 88) - **C:** (180, 18, 20, 182) - **D:** (45, 1, 5, 149) - **E:** (91, 9, 9, 91) --- ## 2) Ensemble learning for loan-default prediction (multiple select) You built an initial classifier to predict whether a customer will default on a loan, but its accuracy is only slightly better than chance. You consider using an ensemble method. Select all statements that are **true**: - (1) If the dataset contains both linear and non-linear relationships, ensemble learning can impair performance compared to most approaches. - (2) Modern ensemble learning techniques can improve overall model interpretability. - (3) Ensemble learning techniques can be time-intensive to train. - (4) Ensemble learning techniques typically create overfitted models. - (5) If the dataset contains both linear and non-linear relationships, ensemble learning can improve performance compared to most approaches. - (6) None of the above. --- ## 3) Decision-tree split criteria (multiple select) You are choosing an impurity measure to score candidate splits in a **classification** decision tree. Select all options that are valid impurity/split criteria: - Entropy - Classification Error - Gini index - Pruning - None of the above --- ## 4) Training loss increases every epoch (multiple select) You train a model to detect intrusion attempts. You notice that **training loss consistently increases every epoch**. Select all rationales that could plausibly cause this: - Regularization is too high - Step size (learning rate) is too large - Regularization is too low - Step size is too small - None of the above --- ## 5) Learning-curve diagnosis (multiple select) You are training a sentiment regressor that predicts a score in \([-1.0, +1.0]\) using **47 features**. You observe this pattern: - **Training error** decreases steadily and becomes very low. - **Validation error** decreases initially, then starts increasing and stays much higher than training error. Select all actions that best address the problem: - Reduce the size of the training data - Include a regularization component to your model - Increase the number of features included in your data - Increase the number of epochs used to train your model - Choose a more complex modeling technique - None of the above --- ## 6) Random Forest vs Gradient Boosting (training speed focus) (multiple select) You need a fast-to-train baseline fraud classifier. You are choosing between a **Random Forest** and a **Gradient Boosting Machine (GBM)**. Select all statements that are **true and relevant** to training speed: - (1) GBM fits trees sequentially rather than independently like Random Forest. - (2) GBM is harder to overfit than Random Forest. - (3) Random Forest is slower than GBM for real-time prediction. - (4) GBMs are typically more accurate than Random Forest for anomaly detection. - (5) Random Forest has fewer parameters than GBM. - (6) None of the above. --- ## 7) Manual forward pass in a small neural network (numeric) Compute the output of the following neural network. Assume: - All **biases are 0**. - Hidden activations \(f_1\) and \(f_2\) are **linear** (identity). - Output activation \(f_3\) is **sigmoid**: \(\sigma(z)=\frac{1}{1+e^{-z}}\). Inputs: - \(x_1 = 1.2\) - \(x_2 = -0.7\) Hidden layer (2 units): - \(h_1 = 0.5\,x_1 + (-1.0)\,x_2\) - \(h_2 = (-0.25)\,x_1 + 0.75\,x_2\) Output layer (1 unit): - \(z = 1.0\,h_1 + 0.5\,h_2\) - \(\hat{y} = \sigma(z)\) **Return \(\hat{y}\) rounded to the nearest thousandth (3 decimals).**

Quick Answer: This question evaluates proficiency with confusion-matrix metrics (recall and false positive rate), ensemble learning trade-offs, decision-tree split/impurity criteria, training-loss and learning-curve diagnostics, regularization effects, and comparative training-speed considerations for Random Forest versus gradient-boosted models.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
  • Explain FlashAttention, KV cache, and RoPE - TikTok (medium)
|Home/Machine Learning/TikTok

Answer ML fundamentals and diagnostics questions

TikTok logo
TikTok
Jan 22, 2026, 12:00 AM
hardMachine Learning EngineerTake-home ProjectMachine Learning
7
0
Loading...

You are taking a timed online assessment with multiple-select and numeric-response questions.

1) Confusion-matrix metrics (multiple select)

A binary classifier is evaluated on 200 examples. For each option below, the confusion matrix is given as (TP, FP, FN, TN).

Select all options where:

  • Recall =TPTP+FN= \frac{TP}{TP+FN}=TP+FNTP​ is > 0.90 , and
  • False Positive Rate (FPR) =FPFP+TN= \frac{FP}{FP+TN}=FP+TNFP​ is < 0.10 .

Options:

  • A: (95, 8, 5, 92)
  • B: (92, 12, 8, 88)
  • C: (180, 18, 20, 182)
  • D: (45, 1, 5, 149)
  • E: (91, 9, 9, 91)

2) Ensemble learning for loan-default prediction (multiple select)

You built an initial classifier to predict whether a customer will default on a loan, but its accuracy is only slightly better than chance. You consider using an ensemble method.

Select all statements that are true:

  • (1) If the dataset contains both linear and non-linear relationships, ensemble learning can impair performance compared to most approaches.
  • (2) Modern ensemble learning techniques can improve overall model interpretability.
  • (3) Ensemble learning techniques can be time-intensive to train.
  • (4) Ensemble learning techniques typically create overfitted models.
  • (5) If the dataset contains both linear and non-linear relationships, ensemble learning can improve performance compared to most approaches.
  • (6) None of the above.

3) Decision-tree split criteria (multiple select)

You are choosing an impurity measure to score candidate splits in a classification decision tree.

Select all options that are valid impurity/split criteria:

  • Entropy
  • Classification Error
  • Gini index
  • Pruning
  • None of the above

4) Training loss increases every epoch (multiple select)

You train a model to detect intrusion attempts. You notice that training loss consistently increases every epoch.

Select all rationales that could plausibly cause this:

  • Regularization is too high
  • Step size (learning rate) is too large
  • Regularization is too low
  • Step size is too small
  • None of the above

5) Learning-curve diagnosis (multiple select)

You are training a sentiment regressor that predicts a score in [−1.0,+1.0][-1.0, +1.0][−1.0,+1.0] using 47 features.

You observe this pattern:

  • Training error decreases steadily and becomes very low.
  • Validation error decreases initially, then starts increasing and stays much higher than training error.

Select all actions that best address the problem:

  • Reduce the size of the training data
  • Include a regularization component to your model
  • Increase the number of features included in your data
  • Increase the number of epochs used to train your model
  • Choose a more complex modeling technique
  • None of the above

6) Random Forest vs Gradient Boosting (training speed focus) (multiple select)

You need a fast-to-train baseline fraud classifier. You are choosing between a Random Forest and a Gradient Boosting Machine (GBM).

Select all statements that are true and relevant to training speed:

  • (1) GBM fits trees sequentially rather than independently like Random Forest.
  • (2) GBM is harder to overfit than Random Forest.
  • (3) Random Forest is slower than GBM for real-time prediction.
  • (4) GBMs are typically more accurate than Random Forest for anomaly detection.
  • (5) Random Forest has fewer parameters than GBM.
  • (6) None of the above.

7) Manual forward pass in a small neural network (numeric)

Compute the output of the following neural network. Assume:

  • All biases are 0 .
  • Hidden activations f1f_1f1​ and f2f_2f2​ are linear (identity).
  • Output activation f3f_3f3​ is sigmoid : σ(z)=11+e−z\sigma(z)=\frac{1}{1+e^{-z}}σ(z)=1+e−z1​ .

Inputs:

  • x1=1.2x_1 = 1.2x1​=1.2
  • x2=−0.7x_2 = -0.7x2​=−0.7

Hidden layer (2 units):

  • h1=0.5 x1+(−1.0) x2h_1 = 0.5\,x_1 + (-1.0)\,x_2h1​=0.5x1​+(−1.0)x2​
  • h2=(−0.25) x1+0.75 x2h_2 = (-0.25)\,x_1 + 0.75\,x_2h2​=(−0.25)x1​+0.75x2​

Output layer (1 unit):

  • z=1.0 h1+0.5 h2z = 1.0\,h_1 + 0.5\,h_2z=1.0h1​+0.5h2​
  • y^=σ(z)\hat{y} = \sigma(z)y^​=σ(z)

Return y^\hat{y}y^​ rounded to the nearest thousandth (3 decimals).

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Machine Learning Engineer•TikTok Machine Learning Engineer•TikTok Machine Learning•Machine Learning Engineer Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.