A/B Test Design: Optional "Work From Home" Filter on Search Page
You are designing an online controlled experiment for a marketplace search page that adds a displayed-but-optional "Work from home" (WFH) filter. When clicked, it narrows listings to remote-friendly options. The goal is to estimate the causal effect of showing this filter on conversion, defined as bookings per visit (bookings/visits).
Provide a complete test plan that specifies:
(a) Hypotheses
-
State the precise null (H0) and alternative (H1) hypotheses for the primary metric.
(b) Experimental Unit, Exposure/Trigger, Identity Handling
-
Choose the unit of randomization (e.g., visitor, household).
-
Define the exposure/trigger (e.g., search-page load) used to include observations in analysis.
-
Explain how returning users, logged-in status, cookies, and cross-device identities are handled (sticky assignment, identity stitching).
(c) Assignment and Compliance Rules
-
How users are assigned to treatment (filter visible) vs control (filter hidden).
-
Enforcement to ensure treatment sees the filter and control does not.
-
How to prevent cross-arm contamination (e.g., URL params, caching, shared devices).
(d) Metric Definition and Attribution
-
Precise numerator and denominator for the primary metric.
-
28-day post-visit attribution window for bookings.
-
Tie-breaking when multiple visits occur before a booking.
-
Whether credit is assigned at the visit- or visitor-level (and any sensitivity alternative).
(e) Guardrails and Quality Checks
-
Guardrail metrics (e.g., latency, bounce rate, listing CTR, cancellations).
-
Pre-trend checks, A/A tests, and SRM (sample ratio mismatch) diagnostics.
(f) Country Segmentation and Aggregation
-
Plan to segment by country and combine strata (e.g., inverse-variance weighting).
-
How to detect and report heterogeneity across countries.
(g) Variance Reduction and Data Quality
-
Methods like CUPED/regression adjustment, stratification.
-
Outlier and bot filtering criteria.
(h) Power and Duration
-
Sample size and duration to detect a 0.5 percentage point absolute uplift (from 3.0% to 3.5%) at 80% power and alpha=0.05.
-
Include expected daily traffic assumptions and resulting duration scenarios.
(i) Analysis Plan: ITT vs Triggered vs Clickers
-
Compare intention-to-treat (ITT) vs triggered analyses given only a subset will click the filter.
-
How to estimate the effect among compliers/clickers without selection bias (e.g., instrumental variables/CACE).
(j) Risks and Mitigations
-
Identify key risks (novelty effects, cannibalization, latency regressions, misattribution across visits) and how to mitigate them.