Answer the following traditional ML questions:
-
Data leakage
-
What is data leakage?
-
Give 2–3 common examples.
-
How do you prevent or fix it in practice?
-
Missing data
-
What are common strategies to handle missing values?
-
When might you drop rows/columns vs impute?
-
How can missingness itself be informative?
-
Linear vs logistic regression losses
-
What loss is commonly used for linear regression? For logistic regression?
-
Compare
MSE vs MAE
: how do they differ, and when might you prefer one?