This question evaluates understanding of estimation and inference for proportions, including CTR point estimation, standard errors, multiple confidence-interval methods (Wald, Wilson, Clopper–Pearson), pooled estimation across days, and the distinction between sample and population variance for day-level variability.
Context: Click-through rate (CTR) is a proportion metric defined as C/I where C is clicks and I is impressions. Assume impressions are independent and C | I ~ Binomial(I, p).
Given I = 200,000 and C = 4,200:
Three days with impressions [50,000, 120,000, 30,000] and CTRs [2.0%, 2.6%, 1.8%]. Compute: a) The pooled CTR across the three days. b) The standard error of the pooled CTR. c) The day-to-day standard deviation of CTR treating days as the unit (unweighted STDDEV_SAMP) vs an impression-weighted day-level SD. When is each appropriate?
Explain why we divide by n−1 (sample variance) vs n (population variance). In SQL, when would you prefer STDDEV_SAMP vs STDDEV_POP for daily CTR and CPC aggregates? Provide concrete examples tied to experiment analysis.
Login required