You are taking a timed online assessment with multiple-select and numeric-response questions.
1) Confusion-matrix metrics (multiple select)
A binary classifier is evaluated on 200 examples. For each option below, the confusion matrix is given as (TP, FP, FN, TN).
Select all options where:
-
Recall
=TP+FNTP
is
> 0.90
, and
-
False Positive Rate (FPR)
=FP+TNFP
is
< 0.10
.
Options:
-
A:
(95, 8, 5, 92)
-
B:
(92, 12, 8, 88)
-
C:
(180, 18, 20, 182)
-
D:
(45, 1, 5, 149)
-
E:
(91, 9, 9, 91)
2) Ensemble learning for loan-default prediction (multiple select)
You built an initial classifier to predict whether a customer will default on a loan, but its accuracy is only slightly better than chance. You consider using an ensemble method.
Select all statements that are true:
-
(1) If the dataset contains both linear and non-linear relationships, ensemble learning can impair performance compared to most approaches.
-
(2) Modern ensemble learning techniques can improve overall model interpretability.
-
(3) Ensemble learning techniques can be time-intensive to train.
-
(4) Ensemble learning techniques typically create overfitted models.
-
(5) If the dataset contains both linear and non-linear relationships, ensemble learning can improve performance compared to most approaches.
-
(6) None of the above.
3) Decision-tree split criteria (multiple select)
You are choosing an impurity measure to score candidate splits in a classification decision tree.
Select all options that are valid impurity/split criteria:
-
Entropy
-
Classification Error
-
Gini index
-
Pruning
-
None of the above
4) Training loss increases every epoch (multiple select)
You train a model to detect intrusion attempts. You notice that training loss consistently increases every epoch.
Select all rationales that could plausibly cause this:
-
Regularization is too high
-
Step size (learning rate) is too large
-
Regularization is too low
-
Step size is too small
-
None of the above
5) Learning-curve diagnosis (multiple select)
You are training a sentiment regressor that predicts a score in [−1.0,+1.0] using 47 features.
You observe this pattern:
-
Training error
decreases steadily and becomes very low.
-
Validation error
decreases initially, then starts increasing and stays much higher than training error.
Select all actions that best address the problem:
-
Reduce the size of the training data
-
Include a regularization component to your model
-
Increase the number of features included in your data
-
Increase the number of epochs used to train your model
-
Choose a more complex modeling technique
-
None of the above
6) Random Forest vs Gradient Boosting (training speed focus) (multiple select)
You need a fast-to-train baseline fraud classifier. You are choosing between a Random Forest and a Gradient Boosting Machine (GBM).
Select all statements that are true and relevant to training speed:
-
(1) GBM fits trees sequentially rather than independently like Random Forest.
-
(2) GBM is harder to overfit than Random Forest.
-
(3) Random Forest is slower than GBM for real-time prediction.
-
(4) GBMs are typically more accurate than Random Forest for anomaly detection.
-
(5) Random Forest has fewer parameters than GBM.
-
(6) None of the above.
7) Manual forward pass in a small neural network (numeric)
Compute the output of the following neural network. Assume:
-
All
biases are 0
.
-
Hidden activations
f1
and
f2
are
linear
(identity).
-
Output activation
f3
is
sigmoid
:
σ(z)=1+e−z1
.
Inputs:
-
x1=1.2
-
x2=−0.7
Hidden layer (2 units):
-
h1=0.5x1+(−1.0)x2
-
h2=(−0.25)x1+0.75x2
Output layer (1 unit):
-
z=1.0h1+0.5h2
-
y^=σ(z)
Return y^ rounded to the nearest thousandth (3 decimals).