CTR as a Proportion: Estimation, Confidence Intervals, and Day-Level Variability
Context: Click-through rate (CTR) is a proportion metric defined as C/I where C is clicks and I is impressions. Assume impressions are independent and C | I ~ Binomial(I, p).
1) Single-day CTR: point estimate, SE, and 95% CIs
Given I = 200,000 and C = 4,200:
-
Estimate p̂ and its standard error.
-
Compute 95% confidence intervals using:
a) Normal/Wald with continuity correction
b) Wilson score interval
c) Exact Clopper–Pearson
-
Compare widths and coverage properties and state which you would use in production and why.
2) Pooled CTR across multiple days
Three days with impressions [50,000, 120,000, 30,000] and CTRs [2.0%, 2.6%, 1.8%]. Compute:
a) The pooled CTR across the three days.
b) The standard error of the pooled CTR.
c) The day-to-day standard deviation of CTR treating days as the unit (unweighted STDDEV_SAMP) vs an impression-weighted day-level SD. When is each appropriate?
3) Sample vs population variance in practice
Explain why we divide by n−1 (sample variance) vs n (population variance). In SQL, when would you prefer STDDEV_SAMP vs STDDEV_POP for daily CTR and CPC aggregates? Provide concrete examples tied to experiment analysis.