Evaluate Metrics and Randomization for Onboarding Tutorial Change
Scenario
A single step within Confluent’s multi-step user-onboarding tutorial was modified. The product team wants to run an experiment to determine whether the change improves the user experience specifically at that step, while ensuring no negative side effects on the overall onboarding flow.
Assumptions for clarity:
-
The tutorial consists of ordered steps (1…k). Only step i was changed; all other steps remain unchanged.
-
We can instrument events at the step level: step_i_view, step_i_submit, step_i_success, step_i_error, help_click, backtrack, abandon, timestamps.
-
Users may belong to accounts (organizations) with multiple users.
Questions
-
Metrics
-
Which primary and secondary metrics would you track that are highly specific to the modified step?
-
Experiment design
-
At which level would you randomize (user vs. account), and what covariates would you examine to verify comparable groups?
-
Inference and sizing
-
Which statistical test(s) would you use? How would you compute required sample size and expected runtime? What alternative test would you prefer if the sample size turns out to be very small?
Hints
Think micro-conversion rates, time-to-complete, event drop-offs; discuss unit-of-analysis alignment and balance checks; consider t/Z tests, nonparametrics or Bayesian for small samples.
Constraints & Assumptions
-
Preserve the scope, facts, inputs, and requested outputs from the prompt above.
-
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
-
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
Clarifying Questions to Ask
-
Clarify the business objective, unit of analysis, time window, exposure definition, and primary metric.
-
State assumptions about instrumentation, randomization, sample size, and data quality.
-
Separate descriptive analysis from causal claims.
What a Strong Answer Covers
-
A metric framework with primary, guardrail, and diagnostic metrics.
-
A credible analysis or experiment design with clear assumptions and bias checks.
-
SQL/statistical logic for segmentation, variance, confidence, and data validation where relevant.
-
An actionable recommendation that explains trade-offs and next steps.
Follow-up Questions
-
What sanity checks would you run before trusting the result?
-
How would you handle novelty effects, seasonality, or selection bias?
-
What decision would you make if metrics disagree?