Question
This is a statistics rapid-fire onsite for a Data Scientist role. Answer each part clearly and precisely — first as if explaining to a Product Manager, then for a technical audience.
Part A — p-values
-
Explain
what a p-value is
in plain language (PM-friendly).
-
Give the
formal definition
of a p-value.
-
How should a p-value be
interpreted
, and what are the
common misinterpretations
to avoid?
-
If you run an A/B test and obtain
p = 0.03
for the primary metric, what decision would you make? What additional context would you request before shipping?
Part B — Linear regression with confounding
5. You fit a linear regression with multiple features and suspect confounding factors exist. How do you interpret each parameter (coefficient)?
6. What does “controlling for other variables” actually mean, and when can that interpretation fail?
7. What is the difference between an associational and a causal interpretation of a coefficient?
8. What checks or approaches would you use to reduce confounding bias?
Part C — L1 vs L2 regularization
9. What are L1 (Lasso) and L2 (Ridge) regularization, and how do they differ in effect?
10. When and why would you prefer one over the other (feature selection, multicollinearity, prediction vs interpretability)?
11. How would you select the regularization strength (λ), and what are the practical tradeoffs?