Statistical Inference: Hypothesis Tests, Confidence Intervals, Sampling Design, and Truncated Normal Estimation
Context
You are evaluating a set of practical statistical tasks common in data science interviews. Assume i.i.d. sampling unless stated otherwise, and use standard large-sample approximations when appropriate.
Tasks
-
Hypothesis test: You test whether a population mean differs from 0 (two-sided). What does a p-value of x% mean in this context?
-
Confidence interval for a mean: Given sample mean x̄ = 1 and standard error SE = 0.1, construct the 95% confidence interval for the population mean. State any assumptions.
-
Targeting a smaller SE: What sample size factor is needed to reduce SE from 0.1 to 0.01? Give the general formula and the implication for the new sample size in terms of the current sample size.
-
If you cannot increase the sample size, what actions can you take to improve inference (e.g., narrower interval, more power)?
-
Tail probability estimation: Given independent observations X₁,…,Xₙ from distribution X, propose an estimator for p = P(X > 10). Construct a 95% confidence interval for p and interpret a resulting interval [a, b] in terms of the true probability p.
-
Estimating an overall conversion rate with 1,000 binary features: You wish to estimate the overall conversion rate in a population where each unit has 1,000 binary features. Describe an estimation/sampling strategy that is efficient and yields an unbiased (or approximately unbiased) estimate of the overall rate.
-
Truncated normal: Assume X ∼ N(μ, σ²) but you only observe Y = X conditioned on X > 3 (left-truncated at 3). How would you estimate μ and σ²? How would you construct 95% confidence intervals for μ and σ² under this truncation?