Implement R² and Compare PCA With/Without Scaling

Q: Implement R² and Compare PCA With/Without Scaling

This question evaluates a candidate's proficiency with regression evaluation metrics and linear-algebra-based dimensionality reduction, specifically implementing a numerically robust R² scorer and performing PCA with and without feature standardization using NumPy.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

NumPy-only implementation: R² and PCA (Data Scientist take-home)

Implement from scratch using only NumPy (no scikit-learn). Use float64 throughout and clearly show numeric results where requested.

(a) r2_score(y_true, y_pred)

Write a function r2_score(y_true, y_pred) that:

Returns 1.0 if predictions are exactly equal to y_true (elementwise equality).
If var(y_true) = 0 (i.e., all y_true are identical) and predictions are not perfect, return -inf.
Otherwise compute R² as 1 − SS_res/SS_tot, where:
- SS_res = sum((y_true − y_pred)²),
- SS_tot = sum((y_true − mean(y_true))²),
- Guard against division by zero using the rules above.

Test on:

y_true = [3, -1, 2, 7, 5], y_pred = [2.5, -0.5, 2.1, 7.8, 5.2]
Edge cases with y_true = [4, 4, 4, 4]:
- y_pred = [4, 4, 4, 4]
- y_pred = [4, 4, 5, 3]

Print the numeric outputs and briefly explain each.

(b) PCA on raw vs standardized features

Given the 6×3 matrix X:

X = [[10, 200, 0.50], [12, 220, 0.40], [ 9, 210, 0.55], [11, 230, 0.60], [ 8, 190, 0.45], [13, 240, 0.65]]

Compute PCA twice:

On raw X (center columns by their mean before covariance).
On standardized X (column-wise zero-mean, unit-variance; use sample std with ddof=1), then compute PCA on that standardized matrix.

For each case:

Center appropriately, compute the covariance matrix S = (X_centered^T X_centered)/(n−1).
Obtain eigenvalues/eigenvectors (use np.linalg.eigh), sort by eigenvalue descending.
Report explained_variance_ratio for the first two components.
Print the first principal component vector (the eigenvector for the largest eigenvalue; note that sign is arbitrary).

Discuss:

How scaling (standardizing) changes the principal components and their explained variance.
Why eigenvector signs may flip without changing the subspace.

Implement R² and Compare PCA With/Without Scaling

NumPy-only implementation: R² and PCA (Data Scientist take-home)

(a) r2_score(y_true, y_pred)

(b) PCA on raw vs standardized features

Solution

Comments (0)

Implement R² and Compare PCA With/Without Scaling

Overview

NumPy-only implementation: R² and PCA (Data Scientist take-home)

(a) r2_score(y_true, y_pred)

(b) PCA on raw vs standardized features

Solution

Comments (0)