A/B Test Metrics and Guardrails for Quick Reply (1-week)
Context
You are adding a Quick Reply feature (suggested reply chips in the DM composer) to a messaging app and will run a 1-week A/B test from 2025-08-25 to 2025-09-01. Define the success metric and guardrails with precise denominators, units of analysis, and attribution windows.
Tasks
-
Primary Success Metric (North Star)
-
Define a single primary metric that captures meaningful user value from Quick Reply.
-
Provide: unit of analysis (e.g., user or user-day), numerator, denominator, inclusion criteria (including a precise exposure definition), and a 24-hour attribution rule from click to reply send.
-
Guardrail Metrics
-
Propose at least two guardrails to protect long-term health (e.g., reply quality/abuse rate, churn/retention, app crashes).
-
For each guardrail, specify: measurement unit, exact formula, and acceptable movement thresholds.
-
Interpreting Mixed Signals
-
Suppose overall reply send rate increases, but average conversation length decreases and complaint rate from Ads-acquired users rises.
-
Explain how you would segment and interpret the metrics to avoid Simpson’s paradox, and decide whether to ship, hold, or iterate.
-
Detecting Empty Engagement
-
Define a metric to detect accidental taps or replies deleted before send.
-
Describe how to implement it using existing telemetry and how it would influence the launch decision.
-
Pre-analysis Plan
-
Describe how you will pre-register primary/secondary metrics, define stopping rules, and prevent metric fishing.
-
Include how you would validate that "exposed" truly means the user saw the Quick Reply entry point (not just was eligible).