This question evaluates proficiency in online experimentation and sequential decision-making, covering Bayesian bandits (Thompson Sampling), safety guardrails for churn, delayed-conversion handling, traffic allocation and stopping/rollback policy design.

You are running an online experiment with 3 variants (including control). The primary objective is to maximize conversions. There is a hard guardrail on churn: any increase in churn above a specified tolerance must trigger mitigation. Conversions are delayed relative to exposure.
(a) Design a Thompson Sampling bandit:
(b) Set traffic floors and fairness constraints across variants and key strata.
(c) Define stopping and rollback policies when the churn guardrail is breached.
(d) Compare expected regret and business impact against a fixed-horizon A/B test under seasonality.
Login required