This question evaluates a data scientist's experimental-design and causal-inference skills applied to ad ranking, including sample-size estimation, variance-reduction methods, interference reasoning (auctions, budget pacing, frequency caps), and marketplace distortion diagnostics.
You are evaluating a new ads ranking model expected to increase revenue but potentially harm user experience (UX). The marketplace has hourly budget pacing, a second-price auction, and per-campaign frequency caps. Cross-device identity is imperfect.
Assumptions:
A) Define a North Star metric and at least three guardrail metrics, with precise formulas and acceptable movement ranges. Explain trade-offs.
B) Choose the randomization unit (request-, session-, or user-level) and justify by addressing interference (auction spillovers, budget pacing, frequency caps) and cross-device identity.
C) Provide a sample-size calculation to detect a +2% lift in the primary metric with 80% power and 5% two-sided alpha. State all assumptions (variance unit, clustering, expected correlation from CUPED/stratification) and show your calculation; then describe how you would re-estimate during the test.
D) Detail variance reduction you would use (e.g., CUPED with pre-period user RPM, stratification by geo/device/ad vertical) and why each is valid.
E) Specify a ramp plan (percentages, duration per stage), early-stop criteria, and how you will handle winner’s curse and novelty effects.
F) Describe diagnostics you will run for marketplace distortions (budget reallocation, cannibalization across campaigns, price feedback loops) and how you would correct for them before a full rollout.
Login required