Implement R² and Compare PCA With/Without Scaling
Company: CVS Health
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Take-home Project
Write Python (NumPy only; no scikit-learn) for both parts. (a) Implement r2_score(y_true, y_pred) that: returns 1.0 if predictions are exactly equal to y_true; if var(y_true)=0 and predictions are not perfect, return -inf; otherwise compute 1 - SS_res/SS_tot using float64 and guarding against division by zero. Test your function on y_true=[3, -1, 2, 7, 5], y_pred=[2.5, -0.5, 2.1, 7.8, 5.2], and on the edge cases y_true=[4,4,4,4] with y_pred=[4,4,4,4] and y_pred=[4,4,5,3]. Show the numeric outputs and briefly explain them. (b) Given the 6×3 matrix X below, compute PCA twice: (i) on raw X and (ii) on standardized X (column-wise zero-mean, unit-variance). In each case: center appropriately, compute the covariance matrix, obtain eigenvalues/eigenvectors, sort by eigenvalue descending, report the explained_variance_ratio_ for the first two components, and print the first principal component vector. Discuss how scaling changes the components and why eigenvector signs may flip without changing the subspace.
X = [[10, 200, 0.50],
[12, 220, 0.40],
[ 9, 210, 0.55],
[11, 230, 0.60],
[ 8, 190, 0.45],
[13, 240, 0.65]]
Quick Answer: This question evaluates a candidate's proficiency with regression evaluation metrics and linear-algebra-based dimensionality reduction, specifically implementing a numerically robust R² scorer and performing PCA with and without feature standardization using NumPy.