Derive L1 vs L2 effects with correlation
Company: TikTok
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Technical Screen
Two standardized predictors x1 and x2 have corr(x1,x2)=0.99. You observe X'X = [[100,99],[99,100]] and X'y = [120,120]. (1) Compute the ridge (L2) estimate w_ridge = (X'X + λI)^{-1} X'y for λ=10 (give numeric weights to 2 decimal places) and explain why L2 prefers sharing weight. (2) Without solving the full LASSO, argue which coefficient pattern the L1 solution is likely to produce for λ1=10 (both similar and small, one near zero and the other large, or something else?) and why, using the geometry of L1 vs L2 constraint sets under high collinearity. (3) Propose elastic-net penalties (α and λ) that would stabilize selection while controlling variance; justify how you would tune α and λ and which validation metric you would pick if the goal is sparse interpretation with minimal loss in PR-AUC.
Quick Answer: This question evaluates understanding of regularization methods (ridge/L2, LASSO/L1, and elastic net), multicollinearity effects, geometric intuition of constraint sets, and hyperparameter influence on coefficient patterns in the Statistics & Math domain.