Prove source growth is cannibalization, not incremental
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
You observe that creation_source = 'web' shows higher revenue in 2026 vs 2025. Design a causal analysis to test whether this growth is primarily cannibalization from other sources (api/mobile) rather than incremental revenue. Specify:
- Identification: Choose and justify an approach (e.g., randomized geo budget shift, difference-in-differences with matched geos, or synthetic control). Define treatment (reducing non-web budgets by X% while holding web constant, or vice versa) and control units, and the time windows.
- Model: Write the core DID equation with unit and time fixed effects and an interaction capturing treatment, plus controls for seasonality, macro trends, advertiser mix, and product changes. State key assumptions (parallel trends, no spillovers) and how you’ll test pre-trends and interference (e.g., cluster or partial interference models).
- Metrics: Define incremental revenue, substitution/cannibalization rate = -ΔRevenue_other_sources / +ΔRevenue_web within randomized units; report confidence intervals. Include unit of analysis (geo, advertiser, or cohort) and how you’ll aggregate.
- Power: Provide a minimal detectable effect calculation given historical variance and sample size (number of geos/advertisers and weeks).
- Robustness: Plan placebo tests on pre-periods, negative-control outcomes, alternative specifications (event study, synthetic control), and sensitivity to heterogeneous treatment effects.
- Decision rule: Pre-specify thresholds on cannibalization rate and incremental ROI that would trigger reallocation. Deliver a mocked analysis plan table schema you’d need (unit_id, date, source, revenue, spend, cohort, geo, treatment_flag).
Quick Answer: This question evaluates causal inference and experimental-design competencies in analytics, testing skills in identification strategy, treatment and control definition, specification of difference‑in‑differences or randomized geo experiments, metric construction, power calculations, and robustness checks for attribution and cannibalization analysis. It is commonly asked to determine whether observed growth is incremental or substitution-driven, falls under the Analytics & Experimentation domain in Data Science, and emphasizes practical application accompanied by conceptual understanding of identification assumptions, statistical power, and measurement.