Solve core probability/statistics mini-problems
Company: Upstart
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Onsite
Answer the following probability/statistics interview questions. Assume all randomness is independent unless stated otherwise.
1) **Radioactive decay (half-life):** A radioactive atom has half-life = 1 day. You start with **n = 100** identical atoms.
- (a) What is the probability a given atom is still undecayed after **m = 10** days?
- (b) What is the distribution of the number of atoms still undecayed after 10 days?
- (c) Compute the expected number of atoms remaining after 10 days, and the probability that **at least one** atom remains.
2) **Bayes’ rule (generic form):** Let event **A** be the “true condition” and event **B** be an observed test result. You are given **P(A)**, **P(B\mid A)**, and **P(B\mid A^c)**. Derive **P(A\mid B)**.
3) **OLS coefficients in two regressions:** Let \(y = x + e\) where \(x \sim \mathcal N(0,1)\), \(e \sim \mathcal N(0,1)\), and \(x\) and \(e\) are independent.
- (a) In the population OLS regression of \(y\) on \(x\) (with intercept), what is the slope coefficient?
- (b) In the population OLS regression of \(x\) on \(y\) (with intercept), what is the slope coefficient?
4) **Monty Hall:** You pick 1 of 3 doors. The host, who knows where the prize is, opens a different door showing no prize, then offers you the chance to switch to the remaining closed door. What strategy maximizes your win probability, and what is that probability?
5) **n-sided die / coupon collector:** You repeatedly roll a fair **n-sided** die. What is the expected number of rolls required to have seen **every face at least once**?
6) **Likelihood:** In parametric modeling, explain what a **likelihood** is and how it differs from a probability statement.
Quick Answer: Category: Statistics & Math; this prompt evaluates core probability and statistical concepts — exponential decay and discrete survival counts, Bayesian updating, properties of population OLS coefficients, conditional-probability reasoning exemplified by Monty Hall, the coupon-collector expected time, and the conceptual distinction between likelihood and probability — relevant for Data Scientist roles. It is commonly asked because it probes foundational distributional intuition, independence and conditional inference, and expectation/estimation at an introductory-to-intermediate theoretical level that underpins applied modeling and experimental interpretation.