A product team wants to raise the per-user rolling 7-day frequency cap for a large video ad campaign from 3 to 4 impressions. Design an experiment and provide power calculations that account for interference and clustering. Context and requirements: - Population: US users eligible for the campaign; expected 4,000,000 eligible users/day over the test. - Randomization candidates: user_id, household_id, or geo cell; average household size among eligible users is m = 1.3; household-level ICC for the primary metric is 0.02. - Primary metric: 7-day conversion rate per unique exposed user (any purchase within 7 days of first exposure), baseline p0 = 2.00%. - Guardrails: daily unique reach, average session watch time, complaint rate per 1,000 impressions. - Traffic allocation: 50% Treatment (cap=4), 50% Control (cap=3), planned duration 28 days, with 4 equally spaced interim looks (including final). - CUPED: pre-period 7-day metric available with R^2 = 0.35 to reduce variance. - Interference risks: auctions shared across campaigns, overlapping advertisers, cross-device households, and pacing controls. Tasks: (1) Choose the randomization unit and justify it with a causal diagram: specify where interference could occur and how your choice mitigates it; propose cross-campaign holdouts or ghost-bids if needed. (2) Define precise metric formulas (numerators/denominators, exposure semantics, attribution window, de-duplication across devices) and the data you would log to compute them unambiguously. (3) Compute the minimum per-arm sample size (unique users) to detect an absolute lift from 2.00% to 2.10% (Δ = +0.10 pp) with α = 0.05 (two-sided) and 1−β = 0.80 using a two-proportion z-test. Adjust for clustering via VIF = 1 + (m−1)·ICC, then adjust again for CUPED by multiplying variance by (1−R^2). Show the final effective sample size and discuss whether 28 days of traffic suffices. (4) Specify sequential monitoring using O’Brien–Fleming boundaries for 4 looks: give approximate nominal α at each look and describe the decision rules. (5) List at least three diagnostic checks (e.g., covariate balance on pre-period exposures, saturation by user quantile, auction pressure) and the exact plots you would produce. Explain how you would interpret each to decide whether to ship the higher cap.

This question evaluates experimental design and causal inference skills, including power analysis, clustering and interference mitigation, metric engineering, sequential monitoring, and diagnostic interpretation for large-scale ad experiments.

How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a hard difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at Netflix.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Netflix during technical interviews.

Design and power a frequency-cap experiment | Netflix Interview Question

Experiment Design: Raising a 7‑Day Frequency Cap from 3→4 Impressions

Context

A large video ad campaign plans to raise the per‑user rolling 7‑day frequency cap from 3 to 4 impressions. The goal is to estimate the causal impact on conversions while accounting for clustering and potential interference (auctions, cross‑device households, overlapping advertisers, pacing).

Population: US users eligible for the campaign; ~4,000,000 eligible users per day during test.
Randomization candidates: user_id, household_id, or geo cell.
- Average household size among eligible users: m = 1.3.
- Household-level ICC for the primary metric: 0.02.
Primary metric: 7‑day conversion rate per unique exposed user (any purchase within 7 days of first exposure), baseline p0 = 2.00%.
Guardrails: daily unique reach, average session watch time, complaint rate per 1,000 impressions.
Traffic allocation: 50% Treatment (cap = 4), 50% Control (cap = 3), duration 28 days, with 4 equally spaced interim looks (including final).
CUPED: pre‑period 7‑day metric available with R^2 = 0.35.
Interference risks: shared auctions across campaigns, overlapping advertisers, cross‑device households, pacing controls.

Tasks

Choose the randomization unit and justify it with a causal diagram. Specify where interference could occur and how your choice mitigates it. Propose cross‑campaign holdouts or ghost‑bids if needed.
Define precise metric formulas (numerators/denominators, exposure semantics, attribution window, de‑duplication across devices) and the data to log to compute them unambiguously.
Compute the minimum per‑arm sample size (unique users) to detect an absolute lift from 2.00% to 2.10% (Δ = +0.10 pp) with α = 0.05 (two‑sided) and 1−β = 0.80 using a two‑proportion z‑test. Adjust for clustering via VIF = 1 + (m−1)·ICC, then adjust variance for CUPED by multiplying by (1−R^2). Show the final effective sample size and discuss whether 28 days of traffic suffices.
Specify sequential monitoring using O’Brien–Fleming boundaries for 4 looks: give approximate nominal α at each look and describe the decision rules.
List at least three diagnostic checks (e.g., covariate balance on pre‑period exposures, saturation by user quantile, auction pressure) and the exact plots you would produce. Explain how you would interpret each to decide whether to ship the higher cap.

Experiment Design: Raising a 7‑Day Frequency Cap from 3→4 Impressions

Context

Population: US users eligible for the campaign; ~4,000,000 eligible users per day during test.
Randomization candidates: user_id, household_id, or geo cell.
- Average household size among eligible users: m = 1.3.
- Household-level ICC for the primary metric: 0.02.
Primary metric: 7‑day conversion rate per unique exposed user (any purchase within 7 days of first exposure), baseline p0 = 2.00%.
Guardrails: daily unique reach, average session watch time, complaint rate per 1,000 impressions.
Traffic allocation: 50% Treatment (cap = 4), 50% Control (cap = 3), duration 28 days, with 4 equally spaced interim looks (including final).
CUPED: pre‑period 7‑day metric available with R^2 = 0.35.
Interference risks: shared auctions across campaigns, overlapping advertisers, cross‑device households, pacing controls.

Tasks

Choose the randomization unit and justify it with a causal diagram. Specify where interference could occur and how your choice mitigates it. Propose cross‑campaign holdouts or ghost‑bids if needed.
Define precise metric formulas (numerators/denominators, exposure semantics, attribution window, de‑duplication across devices) and the data to log to compute them unambiguously.
Compute the minimum per‑arm sample size (unique users) to detect an absolute lift from 2.00% to 2.10% (Δ = +0.10 pp) with α = 0.05 (two‑sided) and 1−β = 0.80 using a two‑proportion z‑test. Adjust for clustering via VIF = 1 + (m−1)·ICC, then adjust variance for CUPED by multiplying by (1−R^2). Show the final effective sample size and discuss whether 28 days of traffic suffices.
Specify sequential monitoring using O’Brien–Fleming boundaries for 4 looks: give approximate nominal α at each look and describe the decision rules.
List at least three diagnostic checks (e.g., covariate balance on pre‑period exposures, saturation by user quantile, auction pressure) and the exact plots you would produce. Explain how you would interpret each to decide whether to ship the higher cap.

Design and power a frequency-cap experiment

Quick Overview

Experiment Design: Raising a 7‑Day Frequency Cap from 3→4 Impressions

Context

Tasks

Solution

Comments (0)

Design and power a frequency-cap experiment

Quick Overview

Experiment Design: Raising a 7‑Day Frequency Cap from 3→4 Impressions

Context

Tasks

Solution

Comments (0)