You work on a mobile travel app (think TripAdvisor-like) that will test a new push-notification policy recommending nearby attractions. Design a rigorous online experiment that accounts for network effects and includes strong guardrails.
Address the following:
-
Define the primary success metric and justify it over plausible alternatives (e.g., 7-day retained sessions per user vs booking rate vs content views). Specify the exact numerator/denominator and the attribution window.
-
Specify guardrail metrics that must not regress (e.g., uninstall rate, notification unsubscribe rate, spam-report rate). Propose alert thresholds and whether they are one-sided or two-sided; explain how you will do sequential monitoring (e.g., spending functions) to avoid inflated Type I error.
-
Choose the cluster unit for randomization (e.g., city, language-locale, connected components in the follow graph). Explain spillover pathways and tradeoffs between larger vs smaller clusters, unequal cluster sizes, and cross-border travelers.
-
Show how you would power the test under cluster randomization. Define intracluster correlation ρ and average cluster size m, compute the design effect DE = 1 + (m−1)ρ, and illustrate with a numeric example how the required sample size changes vs individual randomization.
-
Describe your randomization and stratification plan (platform, notification eligibility, baseline activity), and how you would validate balance at both user and cluster levels.
-
Lay out the analysis plan: cluster-level difference-in-means vs user-level models with cluster-robust SEs vs mixed-effects with random intercepts; include covariate adjustment (e.g., pre-period outcomes, CUPED). Clarify how you will handle partial exposure and users moving across clusters.
-
Explain how you would detect and quantify spillovers (e.g., two-stage randomization, exposure variables like % of a user’s friends in treatment). How would this change your estimand and analysis?
-
Operational safeguards: staged rollout, early-stop criteria when guardrails breach, and what you monitor in the first 24–48 hours.
Make your answers precise and formula-driven where applicable.