Clustered And Networked Experiments
Asked of: Data Scientist
Last updated

What's being tested
Ability to design and analyze experiments when units are connected: recognizing interference, choosing appropriate randomization (cluster, two-stage, or graph-based), and estimating direct and spillover effects with correct variance inference.
Core knowledge
- SUTVA violation: interference means potential outcomes depend on neighbors' treatments; define explicit exposure mappings.
- Partial interference: assume interference only within clusters to enable cluster-randomized inference.
- Cluster-RCT variance inflation: VIF = 1 + (m̄ - 1)·ICC; effective sample size reduced by ICC and cluster size heterogeneity.
- Randomization designs: cluster randomization, two-stage (randomize clusters then units), and graph cluster randomization (METIS, modularity-based).
- Estimands & estimators: direct effect, spillover/peer effect; use Horvitz–Thompson/Hajek and difference-in-means conditional on exposure.
- Inference: randomization inference/permutation tests robust to complex dependencies; cluster-robust SEs and mixed-effects models only if aligned with design.
- Power planning: simulate on observed graph; account for cluster sizes, edge cuts, and assumed spillover kernel (e.g., binary neighbor treated vs fraction treated).
Worked example
Typical framing: "Measure ad effectiveness when users influence each other." First, state estimands: average direct effect (user treated, neighbors control) and spillover effect (user control, fraction of treated neighbors). Next, pick a design: if interference local, form graph clusters (METIS) and randomize clusters; or use two-stage design where clusters get a target treatment proportion. Define exposure mapping (e.g., ≥1 treated neighbor vs none). For power, simulate outcome under assumed direct/spillover parameters on the real graph to choose number of clusters and treated fraction. Analysis plan: estimate HT or Hajek contrasts for each exposure cell, use randomization inference for p-values, and report sensitivity to alternative exposure definitions.
A common pitfall
The tempting move is to ignore interference and run a standard unit-level A/B test with cluster-robust SEs. That often biases estimated effects and underestimates variance because treatment assignment breaks the exposure patterns. Another frequent error is clustering purely by minimizing edge cuts without checking cluster-size balance; tiny clusters inflate VIF and kill power.
Further reading
- Hudgens, M. G., & Halloran, M. E. (2008). "Toward causal inference with interference." Statistica Sinica.
- Ugander, J., Karrer, B., Backstrom, L., & Kleinberg, J. (2013). "Graph Cluster Randomization." WWW 2013.
Related concepts
- Geo and Clustered Experiments
- Cluster Randomized Experiments And Network InterferenceAnalytics & Experimentation
- Experimentation Under Network Interference
- Network Interference And Cluster RandomizationAnalytics & Experimentation
- Statistical Inference For Experiments
- Causal Inference And Quasi-Experiments