Answer ML fundamentals and diagnostics questions

Q: Answer ML fundamentals and diagnostics questions

This is a Machine Learning interview question from TikTok for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Loading...

You are taking a timed online assessment with multiple-select and numeric-response questions.

1) Confusion-matrix metrics (multiple select)

A binary classifier is evaluated on 200 examples. For each option below, the confusion matrix is given as (TP, FP, FN, TN).

Select all options where:

Recall $= \frac{TP}{TP+FN}$ is > 0.90 , and
False Positive Rate (FPR) $= \frac{FP}{FP+TN}$ is < 0.10 .

Options:

A: (95, 8, 5, 92)
B: (92, 12, 8, 88)
C: (180, 18, 20, 182)
D: (45, 1, 5, 149)
E: (91, 9, 9, 91)

2) Ensemble learning for loan-default prediction (multiple select)

You built an initial classifier to predict whether a customer will default on a loan, but its accuracy is only slightly better than chance. You consider using an ensemble method.

Select all statements that are true:

(1) If the dataset contains both linear and non-linear relationships, ensemble learning can impair performance compared to most approaches.
(2) Modern ensemble learning techniques can improve overall model interpretability.
(3) Ensemble learning techniques can be time-intensive to train.
(4) Ensemble learning techniques typically create overfitted models.
(5) If the dataset contains both linear and non-linear relationships, ensemble learning can improve performance compared to most approaches.
(6) None of the above.

3) Decision-tree split criteria (multiple select)

You are choosing an impurity measure to score candidate splits in a classification decision tree.

Select all options that are valid impurity/split criteria:

Entropy
Classification Error
Gini index
Pruning
None of the above

4) Training loss increases every epoch (multiple select)

You train a model to detect intrusion attempts. You notice that training loss consistently increases every epoch.

Select all rationales that could plausibly cause this:

Regularization is too high
Step size (learning rate) is too large
Regularization is too low
Step size is too small
None of the above

5) Learning-curve diagnosis (multiple select)

You are training a sentiment regressor that predicts a score in $[-1.0, +1.0]$ using 47 features.

You observe this pattern:

Training error decreases steadily and becomes very low.
Validation error decreases initially, then starts increasing and stays much higher than training error.

Select all actions that best address the problem:

Reduce the size of the training data
Include a regularization component to your model
Increase the number of features included in your data
Increase the number of epochs used to train your model
Choose a more complex modeling technique
None of the above

6) Random Forest vs Gradient Boosting (training speed focus) (multiple select)

You need a fast-to-train baseline fraud classifier. You are choosing between a Random Forest and a Gradient Boosting Machine (GBM).

Select all statements that are true and relevant to training speed:

(1) GBM fits trees sequentially rather than independently like Random Forest.
(2) GBM is harder to overfit than Random Forest.
(3) Random Forest is slower than GBM for real-time prediction.
(4) GBMs are typically more accurate than Random Forest for anomaly detection.
(5) Random Forest has fewer parameters than GBM.
(6) None of the above.

7) Manual forward pass in a small neural network (numeric)

Compute the output of the following neural network. Assume:

All biases are 0 .
Hidden activations $f_1$ and $f_2$ are linear (identity).
Output activation $f_3$ is sigmoid : $\sigma(z)=\frac{1}{1+e^{-z}}$ .

Inputs:

$x_1 = 1.2$
$x_2 = -0.7$

Hidden layer (2 units):

$h_1 = 0.5\,x_1 + (-1.0)\,x_2$
$h_2 = (-0.25)\,x_1 + 0.75\,x_2$

Output layer (1 unit):

$z = 1.0\,h_1 + 0.5\,h_2$
$\hat{y} = \sigma(z)$

Return $\hat{y}$ rounded to the nearest thousandth (3 decimals).