PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Google

Estimate b when features exceed samples

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in linear regression theory, including identifiability and the sampling distribution of OLS, together with high-dimensional competencies such as regularization, variable selection, dimensionality reduction, properties of the Moore–Penrose pseudoinverse, and the statistical consequences of naive upsampling.

  • Medium
  • Google
  • Machine Learning
  • Data Scientist

Estimate b when features exceed samples

Company: Google

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Technical Screen

Consider the linear model y = Xb + ε with X ∈ R^{n×(m+1)} including an intercept. a) Derive the OLS estimator b̂ = (XᵀX)^{-1}Xᵀy, stating the rank conditions for identifiability and the sampling distribution of b̂ under classical assumptions. b) Now suppose m > n. Describe at least three viable approaches (e.g., ridge: b̂_ridge = (XᵀX + λI)^{-1}Xᵀy; lasso; elastic net; forward selection; PCA/PLS), including how you would choose λ and check generalization (cross‑validation details). c) When does the Moore–Penrose pseudoinverse give a reasonable minimum‑norm solution, and what are its drawbacks? d) Explain why naive upsampling of rows does not resolve rank deficiency and can harm inference.

Quick Answer: This question evaluates proficiency in linear regression theory, including identifiability and the sampling distribution of OLS, together with high-dimensional competencies such as regularization, variable selection, dimensionality reduction, properties of the Moore–Penrose pseudoinverse, and the statistical consequences of naive upsampling.

Related Interview Questions

  • Explain ranking cold-start strategies - Google (medium)
  • Explain LLM fine-tuning and generative models - Google (medium)
  • Compare NLP tokenization and LLM recommendations - Google (medium)
  • Explain LLM lifecycle and trade-offs - Google (medium)
  • Build a bigram next-word predictor with weighted sampling - Google (medium)
|Home/Machine Learning/Google

Estimate b when features exceed samples

Google logo
Google
Oct 13, 2025, 9:49 PM
MediumData ScientistTechnical ScreenMachine Learning
10
0

Consider the linear model y = Xb + ε with X ∈ R^{n×(m+1)} including an intercept. a) Derive the OLS estimator b̂ = (XᵀX)^{-1}Xᵀy, stating the rank conditions for identifiability and the sampling distribution of b̂ under classical assumptions. b) Now suppose m > n. Describe at least three viable approaches (e.g., ridge: b̂_ridge = (XᵀX + λI)^{-1}Xᵀy; lasso; elastic net; forward selection; PCA/PLS), including how you would choose λ and check generalization (cross‑validation details). c) When does the Moore–Penrose pseudoinverse give a reasonable minimum‑norm solution, and what are its drawbacks? d) Explain why naive upsampling of rows does not resolve rank deficiency and can harm inference.

Loading comments...

Browse More Questions

More Machine Learning•More Google•More Data Scientist•Google Data Scientist•Google Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.