Estimate and Derive Regression Coefficient for X on y
Upstart
Aug 4, 2025, 10:55 AM
Data Scientist
Onsite
Statistics & Math
62
0
Statistics & Probability Onsite — Two-Part Question
Context
You have a simple linear data-generating process: y = X + ε, where X and ε are independent standard normals.
Separately, you are surveying a village where each family has 1, 2, or 3 children. Your sample is drawn uniformly at random from children (not families).
Questions
Regress X on y (ordinary least squares with intercept). What is the regression coefficient β and how do you derive it?
In a village, every family has 1, 2, or 3 children. You randomly sample 100 children and observe:
50 from 1-child families
30 from 2-child families
20 from 3-child families
Let π = (π1, π2, π3) be the proportions of families with 1, 2, and 3 children in the village. Because you sampled children, the observed proportions of children from each family size are not equal to π. Answer:
(a) Estimate the proportion π1 of 1-child families.
(b) Construct a 95% confidence interval for π1.
(c) Describe how to obtain an “exact” Bayesian interval by using a Dirichlet prior and deriving a posterior credible interval for π1.
Hints
For Q1, use β = Cov(X, y) / Var(y).
For Q2, if q_k is the child-based proportion observed for family size k, then q_k ∝ k π_k. Convert q to π via π_k ∝ q_k / k, then normalize. For (c), use a Dirichlet prior on child-based probabilities and transform to π.