Estimate and Derive Regression Coefficient for X on y
Company: Upstart
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Onsite
##### Scenario
Onsite round focused on statistics and probability for a research-scientist role
##### Question
The data-generating process is y = X + ε, where X ~ N(0,
1) and ε ~ N(0,
1) are independent. If you instead regress X on y using ordinary least squares, what is the regression coefficient and how do you derive it? 2. In a village every family has either 1, 2, or 3 children. A random sample of 100 children yields: 50 came from 1-child families, 30 from 2-child families, and 20 from 3-child families.
(a) Estimate the proportion of 1-child families in the village.
(b) Construct a 95% confidence interval for that proportion.
(c) How could you obtain an exact interval by modeling the family-size proportions with a Dirichlet prior and deriving the posterior credible interval?
##### Hints
Use β = Cov(X, y) / Var
(y) for Q1. For Q2, convert child counts to family counts via Bayes or likelihood equations, then apply multinomial/Dirichlet formulas.
Quick Answer: This question evaluates understanding of linear regression coefficient identification from a simple generative model and the statistical inference of population proportions under biased sampling, testing skills in parameter estimation, recognition of sampling mechanisms, and both frequentist and Bayesian interval estimation.