Explain Multi-Armed Bandit Principles
Company: Amazon
Role: Machine Learning Engineer
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
Quick Answer: This question evaluates understanding of multi-armed bandit principles and contextual bandits, covering algorithmic trade-offs (regret, exploration–exploitation balance, and modeling assumptions) among epsilon-greedy, UCB, and Thompson sampling, along with operational concerns such as delayed or batched rewards, non‑stationarity, offline policy evaluation, and production safety. It is commonly asked in Analytics & Experimentation and machine learning interviews because it probes both conceptual understanding and practical application of online decision-making, testing the ability to reason about algorithm selection, performance trade-offs, and deployment considerations.