Estimate b when features exceed samples

Q: Estimate b when features exceed samples

This question evaluates proficiency in linear regression theory, including identifiability and the sampling distribution of OLS, together with high-dimensional competencies such as regularization, variable selection, dimensionality reduction, properties of the Moore–Penrose pseudoinverse, and the statistical consequences of naive upsampling.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Consider the linear model y = Xb + ε with X ∈ R^{n×(m+1)} including an intercept. a) Derive the OLS estimator b̂ = (XᵀX)^{-1}Xᵀy, stating the rank conditions for identifiability and the sampling distribution of b̂ under classical assumptions. b) Now suppose m > n. Describe at least three viable approaches (e.g., ridge: b̂_ridge = (XᵀX + λI)^{-1}Xᵀy; lasso; elastic net; forward selection; PCA/PLS), including how you would choose λ and check generalization (cross‑validation details). c) When does the Moore–Penrose pseudoinverse give a reasonable minimum‑norm solution, and what are its drawbacks? d) Explain why naive upsampling of rows does not resolve rank deficiency and can harm inference.

Estimate b when features exceed samples

Overview

Comments (0)