Deploy multi-armed bandits safely

Q: Deploy multi-armed bandits safely

This question evaluates proficiency in online experimentation and sequential decision-making, covering Bayesian bandits (Thompson Sampling), safety guardrails for churn, delayed-conversion handling, traffic allocation and stopping/rollback policy design.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Online bandit with 3 variants, churn guardrail, and delayed conversions

Context

You are running an online experiment with 3 variants (including control). The primary objective is to maximize conversions. There is a hard guardrail on churn: any increase in churn above a specified tolerance must trigger mitigation. Conversions are delayed relative to exposure.

Tasks

(a) Design a Thompson Sampling bandit:

Specify likelihoods and priors for both the primary metric and the churn guardrail.
Explain how you will handle delayed feedback (optimistic versus debiased estimators) and non-stationarity (e.g., discounting or sliding windows).

(b) Set traffic floors and fairness constraints across variants and key strata.

(c) Define stopping and rollback policies when the churn guardrail is breached.

(d) Compare expected regret and business impact against a fixed-horizon A/B test under seasonality.

Assumptions to make explicit

3 variants include control.
Primary outcome is binary conversion within a defined window; churn is a binary event within a defined window; both can arrive with delay.
Guardrail tolerance is an absolute churn increase threshold g_max (could be set to 0 for no increase allowed).

Deploy multi-armed bandits safely

Online bandit with 3 variants, churn guardrail, and delayed conversions

Context

Tasks

Assumptions to make explicit

Solution

Comments (0)

Deploy multi-armed bandits safely

Overview

Online bandit with 3 variants, churn guardrail, and delayed conversions

Context

Tasks

Assumptions to make explicit

Solution

Comments (0)