This question evaluates a data scientist's competencies in data preprocessing (handling missing values), algorithm selection and justification, model interpretation (contrasting Random Forests with linear regression), and understanding of model generalization (overfitting and underfitting).
Context: Technical phone screen for a Data Scientist role. Assume primarily tabular datasets; address both classification and regression where relevant.
(a) How would you handle missing values before model training, and why?
(b) Given a business scenario, how would you choose an appropriate ML algorithm and justify your choice?
(c) Explain Random Forests in lay terms and contrast them with linear regression.
(d) Define overfitting and underfitting, and describe methods to detect and mitigate each.
Login required