PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/TikTok

Answer ML fundamentals and diagnostics questions

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency with confusion-matrix metrics (recall and false positive rate), ensemble learning trade-offs, decision-tree split/impurity criteria, training-loss and learning-curve diagnostics, regularization effects, and comparative training-speed considerations for Random Forest versus gradient-boosted models.

  • hard
  • TikTok
  • Machine Learning
  • Machine Learning Engineer

Answer ML fundamentals and diagnostics questions

Company: TikTok

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Take-home Project

You are taking a timed online assessment with multiple-select and numeric-response questions. ## 1) Confusion-matrix metrics (multiple select) A binary classifier is evaluated on 200 examples. For each option below, the confusion matrix is given as `(TP, FP, FN, TN)`. Select all options where: - **Recall** \(= \frac{TP}{TP+FN}\) is **> 0.90**, and - **False Positive Rate (FPR)** \(= \frac{FP}{FP+TN}\) is **< 0.10**. Options: - **A:** (95, 8, 5, 92) - **B:** (92, 12, 8, 88) - **C:** (180, 18, 20, 182) - **D:** (45, 1, 5, 149) - **E:** (91, 9, 9, 91) --- ## 2) Ensemble learning for loan-default prediction (multiple select) You built an initial classifier to predict whether a customer will default on a loan, but its accuracy is only slightly better than chance. You consider using an ensemble method. Select all statements that are **true**: - (1) If the dataset contains both linear and non-linear relationships, ensemble learning can impair performance compared to most approaches. - (2) Modern ensemble learning techniques can improve overall model interpretability. - (3) Ensemble learning techniques can be time-intensive to train. - (4) Ensemble learning techniques typically create overfitted models. - (5) If the dataset contains both linear and non-linear relationships, ensemble learning can improve performance compared to most approaches. - (6) None of the above. --- ## 3) Decision-tree split criteria (multiple select) You are choosing an impurity measure to score candidate splits in a **classification** decision tree. Select all options that are valid impurity/split criteria: - Entropy - Classification Error - Gini index - Pruning - None of the above --- ## 4) Training loss increases every epoch (multiple select) You train a model to detect intrusion attempts. You notice that **training loss consistently increases every epoch**. Select all rationales that could plausibly cause this: - Regularization is too high - Step size (learning rate) is too large - Regularization is too low - Step size is too small - None of the above --- ## 5) Learning-curve diagnosis (multiple select) You are training a sentiment regressor that predicts a score in \([-1.0, +1.0]\) using **47 features**. You observe this pattern: - **Training error** decreases steadily and becomes very low. - **Validation error** decreases initially, then starts increasing and stays much higher than training error. Select all actions that best address the problem: - Reduce the size of the training data - Include a regularization component to your model - Increase the number of features included in your data - Increase the number of epochs used to train your model - Choose a more complex modeling technique - None of the above --- ## 6) Random Forest vs Gradient Boosting (training speed focus) (multiple select) You need a fast-to-train baseline fraud classifier. You are choosing between a **Random Forest** and a **Gradient Boosting Machine (GBM)**. Select all statements that are **true and relevant** to training speed: - (1) GBM fits trees sequentially rather than independently like Random Forest. - (2) GBM is harder to overfit than Random Forest. - (3) Random Forest is slower than GBM for real-time prediction. - (4) GBMs are typically more accurate than Random Forest for anomaly detection. - (5) Random Forest has fewer parameters than GBM. - (6) None of the above. --- ## 7) Manual forward pass in a small neural network (numeric) Compute the output of the following neural network. Assume: - All **biases are 0**. - Hidden activations \(f_1\) and \(f_2\) are **linear** (identity). - Output activation \(f_3\) is **sigmoid**: \(\sigma(z)=\frac{1}{1+e^{-z}}\). Inputs: - \(x_1 = 1.2\) - \(x_2 = -0.7\) Hidden layer (2 units): - \(h_1 = 0.5\,x_1 + (-1.0)\,x_2\) - \(h_2 = (-0.25)\,x_1 + 0.75\,x_2\) Output layer (1 unit): - \(z = 1.0\,h_1 + 0.5\,h_2\) - \(\hat{y} = \sigma(z)\) **Return \(\hat{y}\) rounded to the nearest thousandth (3 decimals).**

Quick Answer: This question evaluates proficiency with confusion-matrix metrics (recall and false positive rate), ensemble learning trade-offs, decision-tree split/impurity criteria, training-loss and learning-curve diagnostics, regularization effects, and comparative training-speed considerations for Random Forest versus gradient-boosted models.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
  • Explain FlashAttention, KV cache, and RoPE - TikTok (medium)
TikTok logo
TikTok
Jan 22, 2026, 12:00 AM
Machine Learning Engineer
Take-home Project
Machine Learning
4
0
Loading...

You are taking a timed online assessment with multiple-select and numeric-response questions.

1) Confusion-matrix metrics (multiple select)

A binary classifier is evaluated on 200 examples. For each option below, the confusion matrix is given as (TP, FP, FN, TN).

Select all options where:

  • Recall =TPTP+FN= \frac{TP}{TP+FN}=TP+FNTP​ is > 0.90 , and
  • False Positive Rate (FPR) =FPFP+TN= \frac{FP}{FP+TN}=FP+TNFP​ is < 0.10 .

Options:

  • A: (95, 8, 5, 92)
  • B: (92, 12, 8, 88)
  • C: (180, 18, 20, 182)
  • D: (45, 1, 5, 149)
  • E: (91, 9, 9, 91)

2) Ensemble learning for loan-default prediction (multiple select)

You built an initial classifier to predict whether a customer will default on a loan, but its accuracy is only slightly better than chance. You consider using an ensemble method.

Select all statements that are true:

  • (1) If the dataset contains both linear and non-linear relationships, ensemble learning can impair performance compared to most approaches.
  • (2) Modern ensemble learning techniques can improve overall model interpretability.
  • (3) Ensemble learning techniques can be time-intensive to train.
  • (4) Ensemble learning techniques typically create overfitted models.
  • (5) If the dataset contains both linear and non-linear relationships, ensemble learning can improve performance compared to most approaches.
  • (6) None of the above.

3) Decision-tree split criteria (multiple select)

You are choosing an impurity measure to score candidate splits in a classification decision tree.

Select all options that are valid impurity/split criteria:

  • Entropy
  • Classification Error
  • Gini index
  • Pruning
  • None of the above

4) Training loss increases every epoch (multiple select)

You train a model to detect intrusion attempts. You notice that training loss consistently increases every epoch.

Select all rationales that could plausibly cause this:

  • Regularization is too high
  • Step size (learning rate) is too large
  • Regularization is too low
  • Step size is too small
  • None of the above

5) Learning-curve diagnosis (multiple select)

You are training a sentiment regressor that predicts a score in [−1.0,+1.0][-1.0, +1.0][−1.0,+1.0] using 47 features.

You observe this pattern:

  • Training error decreases steadily and becomes very low.
  • Validation error decreases initially, then starts increasing and stays much higher than training error.

Select all actions that best address the problem:

  • Reduce the size of the training data
  • Include a regularization component to your model
  • Increase the number of features included in your data
  • Increase the number of epochs used to train your model
  • Choose a more complex modeling technique
  • None of the above

6) Random Forest vs Gradient Boosting (training speed focus) (multiple select)

You need a fast-to-train baseline fraud classifier. You are choosing between a Random Forest and a Gradient Boosting Machine (GBM).

Select all statements that are true and relevant to training speed:

  • (1) GBM fits trees sequentially rather than independently like Random Forest.
  • (2) GBM is harder to overfit than Random Forest.
  • (3) Random Forest is slower than GBM for real-time prediction.
  • (4) GBMs are typically more accurate than Random Forest for anomaly detection.
  • (5) Random Forest has fewer parameters than GBM.
  • (6) None of the above.

7) Manual forward pass in a small neural network (numeric)

Compute the output of the following neural network. Assume:

  • All biases are 0 .
  • Hidden activations f1f_1f1​ and f2f_2f2​ are linear (identity).
  • Output activation f3f_3f3​ is sigmoid : σ(z)=11+e−z\sigma(z)=\frac{1}{1+e^{-z}}σ(z)=1+e−z1​ .

Inputs:

  • x1=1.2x_1 = 1.2x1​=1.2
  • x2=−0.7x_2 = -0.7x2​=−0.7

Hidden layer (2 units):

  • h1=0.5 x1+(−1.0) x2h_1 = 0.5\,x_1 + (-1.0)\,x_2h1​=0.5x1​+(−1.0)x2​
  • h2=(−0.25) x1+0.75 x2h_2 = (-0.25)\,x_1 + 0.75\,x_2h2​=(−0.25)x1​+0.75x2​

Output layer (1 unit):

  • z=1.0 h1+0.5 h2z = 1.0\,h_1 + 0.5\,h_2z=1.0h1​+0.5h2​
  • y^=σ(z)\hat{y} = \sigma(z)y^​=σ(z)

Return y^\hat{y}y^​ rounded to the nearest thousandth (3 decimals).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Machine Learning Engineer•TikTok Machine Learning Engineer•TikTok Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.