A/B Test: Compare Mean Watch Time Between Variants A and B
Context: You ran an A/B test measuring per-user daily watch_time (in seconds). You obtained two independent samples:
-
Variant A: n_A = 500, mean_A = 75, sd_A = 18
-
Variant B: n_B = 520, mean_B = 78, sd_B = 19
Assume independence and randomization; test for a difference in population means (B vs A).
Tasks:
-
Compute the pooled-variance (equal-variance) and Welch (unequal-variance) two-sample t-statistics and corresponding p-values.
-
If the product goal is "increase watch_time," decide whether a one-tailed or two-tailed test is appropriate and justify rigorously.
-
Construct a 95% confidence interval for the mean difference using Welch’s method.
-
Compute Cohen’s d and Hedges’ g effect sizes.
-
If normality is questionable but sample sizes are large, explain why the t-test is still valid; if heavy tails are suspected, propose a robust alternative and discuss trade-offs.