Design cluster-randomized test under network effects

Q: Design cluster-randomized test under network effects

This question evaluates a data scientist's experiment design and causal inference skills under interference, covering exposure and estimand definition, graph-based cluster construction, cluster-level randomization and contamination mitigation, metric selection, and design-effect/sample-size calculations.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

A/B Test Design for a New Group Call Feature with Network Effects

You are designing an experiment for a Group Call feature where social network effects and interference are expected. Assume a social graph between users, and that the feature is delivered at the user level but can be constrained by design. Address the following:

(a) Define exposure and the estimand(s):

Clearly define user exposure under interference (e.g., own treatment plus share of treated neighbors).
Specify the primary estimand: intent-to-treat (ITT) at the cluster level vs a per-user average treatment effect (ATE) under interference. Be explicit about the population and exposure mapping.

(b) Construct clusters from the social graph:

Describe how to build edge weights for "strong ties" (which signals to include and how to combine them).
Propose a clustering rule (e.g., threshold strong ties then take connected components, or community detection) that yields disjoint clusters of reasonable size.
Explain how you will prevent cluster overlap and cap cluster size.

(c) Randomize at the cluster level and handle cross-cluster edges:

Describe the randomization scheme (stratification, balance criteria).
Define "frontier" users (those with cross-cluster edges) and how you’ll mitigate contamination (e.g., holdout buffers, gating cross-cluster invitations, or partial saturation). Clarify analysis vs exclusion rules for frontier users.

(d) Choose metrics:

Pick one primary metric and two guardrail metrics.
Critically assess "time spent per user per day" as a success metric (pros, cons, manipulation risks).
Propose at least one viable alternative primary metric with a rationale.

(e) Compute design effect (DE) and sample-size inflation for cluster randomization:

Given average cluster size m = 20 and intracluster correlation ICC = 0.05, compute DE.
Recompute DE if m doubles but ICC halves.
Explain implications for required sample size.

(f) Bias from ignoring network effects:

If you randomize by user and ignore interference, under what conditions is the naïve difference-in-means biased downward vs upward?
Provide intuition for positive vs negative spillovers and one real-world example for each direction.

Design cluster-randomized test under network effects

A/B Test Design for a New Group Call Feature with Network Effects

Solution

Comments (0)

Design cluster-randomized test under network effects

Overview

A/B Test Design for a New Group Call Feature with Network Effects

Solution

Comments (0)