Experiment and Metric Plan: New Shopping Module Embedded in the Pins Feed
Context
You are introducing a Shopping module directly into the Pins feed. The goal is to assess whether this module increases shopping outcomes without harming overall user and monetization health. Design a metrics plan and an experiment that:
-
Selects a clear primary success metric and 3–5 guardrail metrics (with precise definitions).
-
Includes at least one intermediate/funnel metric for diagnosis.
-
Addresses spillover/interference (e.g., repins/shares exposing control users) and learning/novelty effects with a concrete experimental design.
-
Specifies power and MDE assumptions, handles seasonality, and includes an AA and CUPED plan.
-
Provides a decision framework given mixed results.
Tasks
-
Metrics
-
Pick one primary success metric and 3–5 guardrail metrics. For each metric, define numerator/denominator, unit of analysis, and aggregation window. Include at least one intermediate/funnel metric (e.g., clicks on the new module, time spent in Shopping).
-
Discuss pros/cons of using DAU and user time spent as primaries versus alternatives (e.g., Shopping CTR, Add-to-Cart Rate, GMV/DAU). For guardrails, specify acceptable directions and magnitudes of change.
-
Experiment Design for Spillover and Learning Effects
-
Choose and justify one concrete design: user-level randomization with exposure-logging and adjacency tests; cluster/geo randomization; switchback (time-based); or two-stage saturation design.
-
Detail: randomization unit, eligibility/exposure rules, cooldown, novelty burn-in, and how you would detect/quantify spillover (e.g., graph distance, household/geo adjacency) and learning (e.g., time-on-feature slope).
-
Power, MDE, and Analysis Hygiene
-
State assumptions for baseline rates/variances, intra-cluster correlation if clustered, horizon length, and how you will handle seasonality and peaky traffic (weekly cycles).
-
Include an AA test and CUPED (covariate adjustment) plan.
-
Decision Framework with Example Results
-
After a 21-day test, suppose you observe: +2.3% lift in Shopping CTR, +1.1% in GMV/user, −0.6% in overall time spent, and −0.2% in DAU.
-
Describe a net weighted lift (or multi-objective) rubric, guardrail thresholds, sensitivity to long-term effects, and what you would recommend to the PM.
-
List additional diagnostics you would run before rollout (e.g., user/creator segment heterogeneity, ad revenue cannibalization, repeat usage vs. novelty).