You are a Data Scientist supporting a large consumer product (e.g., YouTube). A team ships a change intended to reduce client-side latency.
Part A — Experiment design (latency → business impact)
-
Propose an
A/B test
to estimate the causal impact of reduced latency on user engagement.
-
Define:
-
Primary metric(s)
(at least one), and why they are sensitive to latency.
-
Diagnostic metrics
to understand where changes come from.
-
Guardrail metrics
(quality/revenue/reliability) to avoid shipping regressions.
-
Describe:
-
Unit of randomization (user/device/session) and why.
-
Power/MDE approach and what variance drivers you’d account for.
-
Key threats to validity (e.g., novelty, network effects, logging changes, missing data).
Part B — Variance reduction
What techniques would you use to reduce variance (or improve sensitivity) in this experiment? Explain when each is appropriate (e.g., CUPED, stratification, winsorization, triggering, clustered standard errors).
Part C — Diagnosing a ratio metric change
Suppose leadership cares about a ratio metric like CTR = clicks / impressions, or conversion rate = purchases / sessions.
-
If the ratio moved by +0.3%, outline a structured approach to diagnose
why
it changed.
-
Explain how you would decompose the change into numerator/denominator effects and guard against misleading interpretations (e.g., Simpson’s paradox).
Part D — When randomization is not possible (propensity score matching)
Assume you cannot randomize the latency change (e.g., it rolled out selectively due to infra constraints). You only observe that some users experienced lower latency than others.
-
Describe how you would use
propensity score matching (PSM)
to estimate the impact of latency on engagement.
-
List the assumptions required for PSM to be credible and how you would validate/sensitivity-test them.
Assumptions
-
You can define a pre-period to compute baselines/covariates.
-
Logging is available for latency, exposure, and key engagement outcomes.
-
Timezone: use a consistent reporting timezone (e.g., UTC) for daily metrics to avoid boundary artifacts.