PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Meta

Deploy multi-armed bandits safely

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in online experimentation and sequential decision-making, covering Bayesian bandits (Thompson Sampling), safety guardrails for churn, delayed-conversion handling, traffic allocation and stopping/rollback policy design.

  • hard
  • Meta
  • Machine Learning
  • Data Scientist

Deploy multi-armed bandits safely

Company: Meta

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You have 3 variants, a churn guardrail, and delayed conversions. a) Design a Thompson Sampling bandit (specify likelihoods), and explain how you’ll handle delays (e.g., optimistic vs. debiased estimators) and non‑stationarity (e.g., discounting). b) Set traffic floors and fairness constraints. c) Define stopping and rollback policies when a guardrail is breached. d) Compare expected regret and business impact versus a fixed‑horizon A/B under seasonality.

Quick Answer: This question evaluates proficiency in online experimentation and sequential decision-making, covering Bayesian bandits (Thompson Sampling), safety guardrails for churn, delayed-conversion handling, traffic allocation and stopping/rollback policy design.

Related Interview Questions

  • Design and evaluate an ads ranking algorithm - Meta (easy)
  • How would you design a Shop Ads ranking algorithm? - Meta (easy)
  • Derive Linear Regression Solution - Meta (medium)
  • Explain key ML metrics and techniques - Meta (medium)
  • Design an ad recommendation ranking approach - Meta (easy)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
4
0

Online bandit with 3 variants, churn guardrail, and delayed conversions

Context

You are running an online experiment with 3 variants (including control). The primary objective is to maximize conversions. There is a hard guardrail on churn: any increase in churn above a specified tolerance must trigger mitigation. Conversions are delayed relative to exposure.

Tasks

(a) Design a Thompson Sampling bandit:

  • Specify likelihoods and priors for both the primary metric and the churn guardrail.
  • Explain how you will handle delayed feedback (optimistic versus debiased estimators) and non-stationarity (e.g., discounting or sliding windows).

(b) Set traffic floors and fairness constraints across variants and key strata.

(c) Define stopping and rollback policies when the churn guardrail is breached.

(d) Compare expected regret and business impact against a fixed-horizon A/B test under seasonality.

Assumptions to make explicit

  • 3 variants include control.
  • Primary outcome is binary conversion within a defined window; churn is a binary event within a defined window; both can arrive with delay.
  • Guardrail tolerance is an absolute churn increase threshold g_max (could be set to 0 for no increase allowed).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Meta•More Data Scientist•Meta Data Scientist•Meta Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.