Technical Screen: P-values and Robust Two-Sample/Paired Tests
Context: You are a data scientist evaluating healthcare interventions. Answer in clear, interview-ready explanations that a stakeholder could understand and that a peer could reproduce.
Part A — Plain-language p-value
Explain a p-value to a non-scientist without using the phrase "probability the null is true." Use a concrete everyday analogy and clarify common misinterpretations, including:
-
p is not an effect size.
-
p depends on the chosen test, analysis plan, and assumptions.
-
A non-significant p does not prove no effect, and a significant p does not prove practical importance.
Part B — Wilcoxon vs t-test
For each scenario, select and justify the appropriate test, the assumptions, how you would check them, and an effect size measure. Be explicit about handling ties/zeros for rank tests and how you would report confidence intervals.
-
Paired data (n = 12): Systolic BP measured pre/post a low-sodium diet. The distribution of paired differences is skewed with outliers. Choose between a paired t-test and a Wilcoxon signed-rank test. State:
-
Assumptions and how you'd check them.
-
How ties/zero differences are handled.
-
An appropriate effect size (e.g., Cohen's dz vs rank-biserial or matched-pairs r) and CI.
-
Independent samples (n1 = 18, n2 = 25): Compare length of stay between two clinics with unequal variances and heavy-tailed, non-normal distributions. Choose between Welch's t-test and Wilcoxon rank-sum (Mann–Whitney). Discuss:
-
What Mann–Whitney estimates (probability of superiority) vs Welch's mean-difference focus.
-
When each is preferable given goals and data features.
-
How you'd complement with confidence intervals and a robust effect size (e.g., Hedges' g with HC3 SEs or Cliff's delta).