Experiment Design: Replacing Rule-Based Ad Ranking with a Recommender
Context
You are launching a new machine-learning–based ad ranking system to replace a rule-based ranker. The UI and auction rules remain unchanged. You must plan metrics, testing, analysis, visualization, and executive communication for the launch.
Tasks
-
Metrics
-
Define the primary success metric for the launch and the key guardrail metrics across user experience, advertiser outcomes, marketplace health, and system reliability. Explain why.
-
A/B Test Design
-
Describe how you would randomize and analyze the experiment so control and treatment populations are truly comparable, accounting for seasonality and marketplace interference (e.g., budgets, pacing, frequency caps).
-
Sample Size and Duration
-
Provide the formulas you would use to estimate required sample size and test duration for: (a) a proportion metric like CTR; (b) a continuous metric like revenue per user-day (or RPM). State assumptions.
-
CTR vs Advertiser Spend
-
If treatment CTR rises 5%, will advertisers necessarily spend more? Explain the causal path and what additional analyses you would run to diagnose and forecast spend effects.
-
Visualization Critique
-
You are given a line chart where treatment CTR is already higher than control during the pre-launch period. What is wrong with this picture? How would you fix the visualization and/or the design?
-
Executive Summary
-
How would you summarize results, next steps, and risks for senior leadership? Provide a concise structure.
Hints
Think metric hierarchy, randomization unit and stratification, power and MDE, advertiser incentives and auction dynamics, visualization best practices, and executive storytelling.