You work on a two‑sided travel search marketplace and product wants to add a “High Wi‑Fi” badge/filter in the search bar to help remote workers. Recommend whether to launch based on an experiment you design. Be specific and address the following:
-
Randomization under interference: choose and justify one design (user‑level A/B, market‑day switchback, or geo/cluster randomization). Explicitly discuss potential SUTVA violations (demand spillovers, supply re‑ranking, word‑of‑mouth) and how your design mitigates them.
-
Metrics: pre‑specify primary success metrics (e.g., GMV/visitor, bookings/1k searches, conversion%, AOV) and guardrails (bounce%, latency p95, partner cancellations, non‑Wi‑Fi listing CTR). Define exactly how each is computed and at what aggregation level.
-
Intermediate metrics if topline is flat: badge/filter usage rate, CTR to Wi‑Fi listings, dwell time, queries per session, supply coverage with Wi‑Fi badge, search refinement rate.
-
Power, sample size, and runtime: compute MDE and duration using these constraints and baselines: daily active searchers=2,000,000; baseline search→booking conversion=3.0%; AOV=$120; variable margin=12%; treatment share=50%; desired power=0.80; alpha=0.05; cluster choice implies ICC=0.02 with average market‑day cluster size=20,000 visitors; stop only at full weeks. Product expects a 0.20–0.40 percentage‑point lift in conversion—show whether this is detectable and how many weeks are needed under your chosen design (include design effect if clustered).
-
Analysis plan: variance reduction (e.g., CUPED/stratification), outlier handling, heterogeneity (remote‑heavy markets, mobile vs desktop, business vs leisure), pre‑commit stopping rule, and how you will check randomization balance and interference diagnostics.
-
Decision rule to NPV: translate estimated effects into incremental margin dollars. Assume a one‑time engineering cost of $400k and an ongoing partner churn risk if CTR to non‑Wi‑Fi listings drops >5%. State precise launch/hold/kill thresholds.
-
If the A/B is impractical (e.g., heavy interference), propose a quasi‑experiment (staggered geo roll‑out with difference‑in‑differences and negative‑control outcomes). Specify the exact calendar, identification assumptions, and robustness checks.
Deliver: a written design, formulas used, and the numeric recommendation to launch or not under plausible effect sizes.