You are interviewing for an Applied Scientist role.
-
For a binary classification problem, explain the following and when you would use each:
-
Precision, recall, F1
-
Confusion matrix terms (TP/FP/TN/FN)
-
ROC curve and AUC
-
(Optionally) Precision–Recall curve and why it can be preferable under class imbalance
-
Explain the difference between L1 and L2 regularization:
-
The mathematical form added to the loss
-
The effect on learned weights (e.g., sparsity)
-
Practical guidance on when you would choose L1 vs L2
-
You have an NLP model with multiple components (e.g., preprocessing, encoder choice, retrieval module, prompt template, reranker, decoding settings). Describe how you would design an ablation study to identify which components materially contribute to performance, including:
-
What you keep constant vs vary
-
How you avoid confounders
-
How you decide whether a change is significant