Experiment Design: Downranking Suspected Bad Sellers in Search
Context
-
You are designing a decision framework and online experiment to test penalizing sellers suspected of bad behavior (e.g., fraud, policy violations, poor quality) in marketplace search results. A risk model scores sellers; the intervention alters ranking for items from higher-risk sellers.
-
Goal: Reduce harmful outcomes without materially hurting buyer experience, marketplace liquidity, pricing, or fairness.
Tasks
(a) Precisely define the treatment
-
Specify exactly how the ranking will be modified for risk-scored sellers (e.g., push down by k ranks or apply a multiplicative penalty to the ranking score).
(b) Choose a randomization unit that controls interference
-
Compare session-level, query-level, and seller-level cluster randomization.
-
Justify a choice and describe how to prevent cross-arm contamination within the same search page.
(c) Define primary success metrics and guardrails with exact formulas
-
Include numerators/denominators/units for: chargeback_rate, complaints_per_1k_orders, bad_seller_impressions_share, GMV, add-to-cart rate, search CTR, price index, selection coverage, latency, etc.
(d) Propose a ramp plan with stop/go criteria and a pre-specified analysis window
-
Example: 1% → 5% → 10% → 50%.
-
Include minimal detectable effect (MDE) assumptions and a sample size plan, especially for rare-event metrics.
(e) Handle model uncertainty
-
Explain how offline precision/recall and false positives affect expected treatment effect.
-
Propose stratifying or banditing the penalty by risk score.
(f) Heterogeneity and unintended effects
-
Identify heterogeneity to check (e.g., new-seller cold start, category/geography fairness) and how to mitigate issues.