PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Analytics & Experimentation/Snowflake

Design an A/B test for ML model launch

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competence in experimental design, statistical power analysis, sequential monitoring, covariate adjustment (e.g., CUPED), and practical operational concerns like guardrails, interference, and ramping for ML model launches.

  • hard
  • Snowflake
  • Analytics & Experimentation
  • Data Scientist

Design an A/B test for ML model launch

Company: Snowflake

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Technical Screen

You are replacing the current ranker with a new model in a feed. Baseline CTR is 2.0%. You expect a +5% relative lift on CTR and want 90% power at α=0.05. Daily eligible traffic is 1,000,000 users; assignment is user-level 50/50, stable over time. A) Compute the required sample size and minimum test duration using a two-proportion power analysis; state all assumptions (variance, independence, no interference). Show your formulas and reasoning; adjust for a 5% expected bot/invalid traffic rate and a 1% potential sample ratio mismatch. B) Plan guardrail metrics (bounce rate, crashes, latency p95, revenue per user) and decision thresholds. Describe sequential monitoring with α-spending (e.g., Pocock or O’Brien–Fleming) to allow early stop without inflating Type I error. C) Address novelty and day-of-week effects: propose CUPED (covariate adjustment) using pre-experiment user CTR; specify the exact covariate and how you would validate variance reduction without bias. D) Mitigate interference and contamination: ensure user-level bucketing, prevent cross-arm content spillover, and plan a geo or holdout if network effects are suspected. E) After the test, outline checks for power actually achieved, heterogeneity of treatment effects across cohorts, and how you’d decide to ramp to 100%.

Quick Answer: This question evaluates a data scientist's competence in experimental design, statistical power analysis, sequential monitoring, covariate adjustment (e.g., CUPED), and practical operational concerns like guardrails, interference, and ramping for ML model launches.

Related Interview Questions

  • Decide and justify product metrics amid trade-offs - Snowflake (hard)
Snowflake logo
Snowflake
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Analytics & Experimentation
3
0

Feed Ranker A/B Test Design and Powering

You are replacing the current ranker with a new model in a feed. Baseline CTR is 2.0%. You expect a +5% relative lift on CTR and want 90% power at α = 0.05. Daily eligible traffic is 1,000,000 users; assignment is user-level 50/50 and stable over time.

Assume CTR is measured at the user level over the test window (Bernoulli per user: clicked ≥1 vs. not), users are independently and identically distributed within arms, and there is no interference (SUTVA) unless otherwise addressed.

A) Sample Size and Minimum Duration

  • Use a two-proportion power analysis for a two-sided test to detect an absolute lift of 0.1 percentage points (from 2.0% to 2.1%).
  • State and use your variance assumptions and show formulas and steps.
  • Adjust the resulting sample size for (i) 5% expected bot/invalid traffic and (ii) up to 1% sample ratio mismatch (SRM).
  • Compute the minimum test duration given daily traffic and 50/50 assignment.

B) Guardrails and Sequential Monitoring

  • Propose guardrail metrics (bounce rate, crashes, p95 latency, revenue per user) with decision thresholds and how you will test them (e.g., non-inferiority).
  • Describe a sequential monitoring plan using α-spending (e.g., Pocock or O’Brien–Fleming via Lan–DeMets) that allows early stopping without inflating Type I error.

C) Novelty, DOW Effects, and CUPED

  • Address novelty and day-of-week effects.
  • Propose CUPED (covariate adjustment) using pre-experiment user CTR: specify the covariate, the adjustment, and how you would validate variance reduction without bias.

D) Interference and Contamination

  • Mitigate interference: ensure user-level bucketing, prevent cross-arm content spillover, and plan a geo or holdout if network effects are suspected.

E) Post-Test Checks and Ramp

  • After the test, outline checks for achieved power, heterogeneity of treatment effects across cohorts, and how you’d decide to ramp to 100%.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Snowflake•More Data Scientist•Snowflake Data Scientist•Snowflake Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.