PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Statistics & Math/Upstart

Estimate and Derive Regression Coefficient for X on y

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of linear regression coefficient identification from a simple generative model and the statistical inference of population proportions under biased sampling, testing skills in parameter estimation, recognition of sampling mechanisms, and both frequentist and Bayesian interval estimation.

  • medium
  • Upstart
  • Statistics & Math
  • Data Scientist

Estimate and Derive Regression Coefficient for X on y

Company: Upstart

Role: Data Scientist

Category: Statistics & Math

Difficulty: medium

Interview Round: Onsite

##### Scenario Onsite round focused on statistics and probability for a research-scientist role ##### Question The data-generating process is y = X + ε, where X ~ N(0, 1) and ε ~ N(0, 1) are independent. If you instead regress X on y using ordinary least squares, what is the regression coefficient and how do you derive it? 2. In a village every family has either 1, 2, or 3 children. A random sample of 100 children yields: 50 came from 1-child families, 30 from 2-child families, and 20 from 3-child families. (a) Estimate the proportion of 1-child families in the village. (b) Construct a 95% confidence interval for that proportion. (c) How could you obtain an exact interval by modeling the family-size proportions with a Dirichlet prior and deriving the posterior credible interval? ##### Hints Use β = Cov(X, y) / Var (y) for Q1. For Q2, convert child counts to family counts via Bayes or likelihood equations, then apply multinomial/Dirichlet formulas.

Quick Answer: This question evaluates understanding of linear regression coefficient identification from a simple generative model and the statistical inference of population proportions under biased sampling, testing skills in parameter estimation, recognition of sampling mechanisms, and both frequentist and Bayesian interval estimation.

Related Interview Questions

  • Correct length-biased sampling from family-size survey - Upstart (easy)
  • Compute decay, OLS, and classic probability results - Upstart (easy)
  • Solve core probability/statistics mini-problems - Upstart (medium)
  • Combine noisy thermometers; compute random-walk correlations - Upstart (easy)
  • Analyze HT vs HH stopping-time probabilities - Upstart (medium)
Upstart logo
Upstart
Aug 4, 2025, 10:55 AM
Data Scientist
Onsite
Statistics & Math
91
0

Statistics & Probability Onsite — Two-Part Question

Context

  • You have a simple linear data-generating process: y = X + ε, where X and ε are independent standard normals.
  • Separately, you are surveying a village where each family has 1, 2, or 3 children. Your sample is drawn uniformly at random from children (not families).

Questions

  1. Regress X on y (ordinary least squares with intercept). What is the regression coefficient β and how do you derive it?
  2. In a village, every family has 1, 2, or 3 children. You randomly sample 100 children and observe:
    • 50 from 1-child families
    • 30 from 2-child families
    • 20 from 3-child families
    Let π = (π1, π2, π3) be the proportions of families with 1, 2, and 3 children in the village. Because you sampled children, the observed proportions of children from each family size are not equal to π. Answer: (a) Estimate the proportion π1 of 1-child families. (b) Construct a 95% confidence interval for π1. (c) Describe how to obtain an “exact” Bayesian interval by using a Dirichlet prior and deriving a posterior credible interval for π1.

Hints

  • For Q1, use β = Cov(X, y) / Var(y).
  • For Q2, if q_k is the child-based proportion observed for family size k, then q_k ∝ k π_k. Convert q to π via π_k ∝ q_k / k, then normalize. For (c), use a Dirichlet prior on child-based probabilities and transform to π.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Upstart•More Data Scientist•Upstart Data Scientist•Upstart Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.