Context
You are launching an online A/B test for a new version of a recommendation algorithm. The goal of the new algorithm is to increase users’ exploration behavior (discovering new or diverse content). A known challenge is that, in the short term, the new algorithm may slightly reduce core engagement metrics (e.g., like rate).
Tasks
-
Experiment design:
How would you design and run this experiment to decide whether to launch the new algorithm?
-
Metric definition:
-
How would you
define and measure “exploration”
?
-
How would you define and measure
long-term user satisfaction
?
-
Besides short-term engagement (e.g., like rate), what
long-term retention / ecosystem health
metrics would you track?
-
Limited duration constraint:
If the experiment window is limited and you cannot fully observe long-term impact, how would you
analyze, interpret, and make a launch recommendation
(e.g., ship / don’t ship / ramp gradually)?
Assumptions (you may state and adjust)
-
Randomization unit can be user-level.
-
You have event logs (impressions, clicks, likes, dwell time, follows, hides/blocks), content metadata (topic/category/creator), and user history.
-
You can run holdouts or ramps if needed.