PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/CVS Health

Implement R² and Compare PCA With/Without Scaling

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's proficiency with regression evaluation metrics and linear-algebra-based dimensionality reduction, specifically implementing a numerically robust R² scorer and performing PCA with and without feature standardization using NumPy.

  • medium
  • CVS Health
  • Machine Learning
  • Data Scientist

Implement R² and Compare PCA With/Without Scaling

Company: CVS Health

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Take-home Project

Write Python (NumPy only; no scikit-learn) for both parts. (a) Implement r2_score(y_true, y_pred) that: returns 1.0 if predictions are exactly equal to y_true; if var(y_true)=0 and predictions are not perfect, return -inf; otherwise compute 1 - SS_res/SS_tot using float64 and guarding against division by zero. Test your function on y_true=[3, -1, 2, 7, 5], y_pred=[2.5, -0.5, 2.1, 7.8, 5.2], and on the edge cases y_true=[4,4,4,4] with y_pred=[4,4,4,4] and y_pred=[4,4,5,3]. Show the numeric outputs and briefly explain them. (b) Given the 6×3 matrix X below, compute PCA twice: (i) on raw X and (ii) on standardized X (column-wise zero-mean, unit-variance). In each case: center appropriately, compute the covariance matrix, obtain eigenvalues/eigenvectors, sort by eigenvalue descending, report the explained_variance_ratio_ for the first two components, and print the first principal component vector. Discuss how scaling changes the components and why eigenvector signs may flip without changing the subspace. X = [[10, 200, 0.50], [12, 220, 0.40], [ 9, 210, 0.55], [11, 230, 0.60], [ 8, 190, 0.45], [13, 240, 0.65]]

Quick Answer: This question evaluates a candidate's proficiency with regression evaluation metrics and linear-algebra-based dimensionality reduction, specifically implementing a numerically robust R² scorer and performing PCA with and without feature standardization using NumPy.

Related Interview Questions

  • Build a leak-free sklearn churn pipeline - CVS Health (medium)
  • Handle challenges in MMM/MMX - CVS Health (hard)
  • Design classification under missingness and imbalance - CVS Health (hard)
  • Tune classifier and compute key metrics - CVS Health (medium)
  • Build an uplift model for targeting - CVS Health (hard)
CVS Health logo
CVS Health
Oct 13, 2025, 9:49 PM
Data Scientist
Take-home Project
Machine Learning
2
0

NumPy-only implementation: R² and PCA (Data Scientist take-home)

Implement from scratch using only NumPy (no scikit-learn). Use float64 throughout and clearly show numeric results where requested.

(a) r2_score(y_true, y_pred)

Write a function r2_score(y_true, y_pred) that:

  • Returns 1.0 if predictions are exactly equal to y_true (elementwise equality).
  • If var(y_true) = 0 (i.e., all y_true are identical) and predictions are not perfect, return -inf.
  • Otherwise compute R² as 1 − SS_res/SS_tot, where:
    • SS_res = sum((y_true − y_pred)²),
    • SS_tot = sum((y_true − mean(y_true))²),
    • Guard against division by zero using the rules above.

Test on:

  • y_true = [3, -1, 2, 7, 5], y_pred = [2.5, -0.5, 2.1, 7.8, 5.2]
  • Edge cases with y_true = [4, 4, 4, 4]:
    • y_pred = [4, 4, 4, 4]
    • y_pred = [4, 4, 5, 3]

Print the numeric outputs and briefly explain each.

(b) PCA on raw vs standardized features

Given the 6×3 matrix X:

X = [[10, 200, 0.50], [12, 220, 0.40], [ 9, 210, 0.55], [11, 230, 0.60], [ 8, 190, 0.45], [13, 240, 0.65]]

Compute PCA twice:

  1. On raw X (center columns by their mean before covariance).
  2. On standardized X (column-wise zero-mean, unit-variance; use sample std with ddof=1), then compute PCA on that standardized matrix.

For each case:

  • Center appropriately, compute the covariance matrix S = (X_centered^T X_centered)/(n−1).
  • Obtain eigenvalues/eigenvectors (use np.linalg.eigh), sort by eigenvalue descending.
  • Report explained_variance_ratio for the first two components.
  • Print the first principal component vector (the eigenvector for the largest eigenvalue; note that sign is arbitrary).

Discuss:

  • How scaling (standardizing) changes the principal components and their explained variance.
  • Why eigenvector signs may flip without changing the subspace.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More CVS Health•More Data Scientist•CVS Health Data Scientist•CVS Health Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.