Design a clustered A/B test with spillovers
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: HR Screen
You need to test a social feature likely to cause network spillovers. You will randomize by geographic market clusters, not by user.
1) Unit of randomization: Justify cluster-level randomization and specify the estimand (cluster-average treatment effect). Define a contamination scenario that would violate SUTVA if you randomized by user.
2) Sample size with ICC: Baseline conversion = 10%, target absolute lift = +1 pp, α=0.05 (two-sided), power=0.80. You have 200 clusters per arm with average m=200 users observed per cluster and intracluster correlation ICC=0.06. Compute the design effect DEFF = 1 + (m−1)·ICC and the effective sample size N_eff per arm. Explain how DEFF changes if you halve m but double the number of clusters (holding total users fixed).
3) Assignment: Describe a principled way to form clusters to minimize cross-cluster edges (e.g., graph partitioning) and how you’d check balance pre-experiment (standardized mean differences, cluster-level covariates).
4) Gradual change: If adoption ramps gradually across treated clusters, propose an analysis plan (e.g., staggered adoption difference-in-differences with cluster and time fixed effects). State one assumption required for identification and one robustness check.
5) Guardrails and metrics: Define primary, secondary, and guardrail metrics. Specify how you will handle multiple testing and early stopping.
Quick Answer: This question evaluates a data scientist's understanding of cluster-randomized experiments with spillovers, covering causal inference under interference, intracluster correlation and power/sample-size calculations, cluster formation and balance checks, staggered-adoption analysis, and metrics and multiple-testing control.