Linear Regression, p-values, and Chi-square with Large Samples
Context
You are analyzing regression and goodness-of-fit results. Consider what happens if you mechanically duplicate each row of your dataset (same X and y repeated once), how to interpret p-values in practice, and how very large samples affect chi-square tests.
Questions
-
If every observation in a linear regression dataset is duplicated (each row repeated once), how do the coefficient estimates and their standard errors change? Show the math.
-
In practical terms, what does a p-value represent, and what common misinterpretations should be avoided?
-
How does a very large sample size influence a chi-square test, and what penalty/adjustment can keep results interpretable?