PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Gts

Analyze duplicating data in linear regression

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of linear regression theory and statistical inference, specifically how duplicating observations affects OLS coefficient estimates, estimated standard errors, t-statistics, R^2, and adjusted R^2.

  • medium
  • Gts
  • Machine Learning
  • Data Scientist

Analyze duplicating data in linear regression

Company: Gts

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Take-home Project

You fit a standard linear regression model (with intercept) using ordinary least squares (OLS). Suppose you have: - Design matrix \(X\) of size \(n \times p\) (\(p\) parameters including the intercept). - Response vector \(y\) of length \(n\). You now **duplicate every observation once**, forming a new dataset by stacking the original data under itself: - New design matrix \(X^* = \begin{bmatrix} X \\ X \end{bmatrix}\) of size \(2n \times p\). - New response vector \(y^* = \begin{bmatrix} y \\ y \end{bmatrix}\) of length \(2n\). You refit the *same* regression model on this duplicated dataset using OLS and compute the usual summary statistics. How do the following quantities change, if at all, compared with the original fit? 1. The OLS coefficient estimates \(\hat{\beta}\). 2. The standard errors of the coefficients. 3. The t-statistics for the coefficients. 4. \(R^2\). 5. Adjusted \(R^2\). Explain your reasoning mathematically (you may use matrix notation) and also interpret the result intuitively.

Quick Answer: This question evaluates understanding of linear regression theory and statistical inference, specifically how duplicating observations affects OLS coefficient estimates, estimated standard errors, t-statistics, R^2, and adjusted R^2.

Related Interview Questions

  • Compute value of card guessing game - Gts (medium)
Gts logo
Gts
Sep 18, 2025, 12:00 AM
Data Scientist
Take-home Project
Machine Learning
2
0
Loading...

You fit a standard linear regression model (with intercept) using ordinary least squares (OLS). Suppose you have:

  • Design matrix XXX of size n×pn \times pn×p ( ppp parameters including the intercept).
  • Response vector yyy of length nnn .

You now duplicate every observation once, forming a new dataset by stacking the original data under itself:

  • New design matrix X∗=[XX]X^* = \begin{bmatrix} X \\ X \end{bmatrix}X∗=[XX​] of size 2n×p2n \times p2n×p .
  • New response vector y∗=[yy]y^* = \begin{bmatrix} y \\ y \end{bmatrix}y∗=[yy​] of length 2n2n2n .

You refit the same regression model on this duplicated dataset using OLS and compute the usual summary statistics.

How do the following quantities change, if at all, compared with the original fit?

  1. The OLS coefficient estimates β^\hat{\beta}β^​ .
  2. The standard errors of the coefficients.
  3. The t-statistics for the coefficients.
  4. R2R^2R2 .
  5. Adjusted R2R^2R2 .

Explain your reasoning mathematically (you may use matrix notation) and also interpret the result intuitively.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Gts•More Data Scientist•Gts Data Scientist•Gts Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.