This question evaluates core probability and machine learning statistics competencies including probability basics, descriptive statistics, correlation versus independence, linear regression and regularization, and dimensionality reduction (PCA), falling under the Machine Learning domain for Data Scientist roles and testing both conceptual understanding and practical application. Such multi-part theory questions are commonly asked to probe understanding of fundamental statistical concepts and modeling assumptions, ensuring the candidate can reason about uncertainty, estimator properties, multicollinearity, and eigenstructure without relying solely on implementation details.
Answer the following short theory questions (you may use equations and brief examples):