Causal Inference, Difference-In-Differences, And Cannibalization
Asked of: Data Scientist
Last updated

What's being tested
Meta is testing whether you can turn an ambiguous product or marketing question into a defensible causal estimand, not just describe correlations in `DAU`, clicks, conversions, or revenue. The interviewer wants to see if you can choose between randomization, difference-in-differences, matched-market tests, synthetic control, and other quasi-experimental designs based on what is feasible and what assumptions are credible. For a Data Scientist, the core skill is knowing when growth in one source is truly incremental versus shifted from another source, and how to quantify uncertainty under messy product realities like interference, staggered rollout, selection, and seasonality. Meta cares because product launches, ranking changes, ads, notifications, and creator incentives often affect ecosystems where users, advertisers, and content sources interact.
Core knowledge
-
Causal estimand comes before method choice. Define whether you need
`ATE`,`ATT`, incremental conversions, net revenue, retained users, or substitution across channels. For cannibalization, estimate both source-level lift and total-system lift: . -
Cannibalization means a treatment increases one metric source while decreasing another, leaving aggregate value flat or negative. A strong answer decomposes
Total Engagementinto components likeFeed,Reels,Stories,Search, orNotifications, then tests whether source gains are offset elsewhere. -
Difference-in-differences compares treated and control units before and after treatment:
It identifies causal effects if parallel trends would have held absent treatment. -
Parallel trends diagnostics should use multiple pre-periods, not just one baseline. Plot event-study coefficients, test pre-treatment leads, and compare seasonality. A non-significant pre-trend test is not proof; visual and business plausibility matter, especially around holidays, launches, or news cycles.
-
Two-way fixed effects can fail under staggered adoption when treatment effects vary over time or cohorts. Avoid blindly running ; consider Callaway-Sant’Anna, Sun-Abraham, or cohort-specific event studies for cleaner group-time effects.
-
Geo experiments or matched-market lift tests are useful when user-level randomization is impossible, especially for brand ads, shuttles, local launches, or market-level spend changes. With only ~20–200 geo clusters, use matched pairs, pre-period outcome correlation, cluster-robust errors, and randomization inference.
-
Synthetic control constructs a weighted combination of untreated units to match the treated unit pre-period. It is strongest with one or few treated geos and many donor markets; avoid donors that may be exposed through spillovers, shared media markets, or national campaigns.
-
Interference violates the stable unit treatment value assumption when one user’s treatment affects another user’s outcome. At Meta, social graph effects are common: notifications, shares, comments, marketplace liquidity, and ad auctions can spill across treatment and control users.
-
Cluster randomization handles interference by randomizing groups such as geos, schools, workplaces, advertiser accounts, or graph clusters. The tradeoff is power: effective sample size becomes number of clusters, not users, so variance is often much larger than in a user-level A/B test with millions of users.
-
Exposure mapping clarifies what “treated” means under spillovers. Instead of binary assignment only, model outcomes by own treatment and neighbor exposure, e.g. . This helps separate direct effects from network effects.
-
Robustness checks make observational causal claims credible. Use placebo outcomes, placebo treatment dates, negative-control geos, alternative control groups, leave-one-market-out sensitivity, covariate balance, pre-period fit, and heterogeneity by segment such as new users, heavy users, advertisers, or creators.
-
Power and MDE should match the design unit. For clustered tests, approximate detectable effect using cluster-level variance and intraclass correlation, not raw user count. A campaign with 100M impressions but 30 markets may still be underpowered for small brand-lift effects.
Tip: In interviews, say the identifying assumption out loud, then explain how you would try to falsify it. That is usually more valuable than naming a fancy method.
Worked example
For Prove source growth is cannibalization, not incremental, start by clarifying what “source” means: a traffic channel, product surface, recommendation module, notification type, or ad placement. Then define the business outcome hierarchy: source-level Sessions or Clicks is not enough; you need aggregate Time Spent, Revenue, Retention, or another north-star metric to determine incrementality. A strong first-30-seconds frame is: “I’d estimate the causal effect of increasing exposure to this source on both its own metric and total platform value, then test whether gains are offset by declines in substitute sources.”
Organize the answer into four pillars: metric decomposition, identification strategy, validation checks, and decision rule. For identification, prefer a randomized holdout if feasible: randomly suppress or reduce the source for eligible users, compare total outcomes, and decompose deltas by surface. If randomization is not feasible, propose DiD across users, geos, or cohorts with differential exposure changes, using pre-trend diagnostics and matched controls. The key design decision is whether the unit of analysis should be user-level or cluster-level; if users interact or content supply is shared, cluster randomization or geo-level analysis may be safer despite lower power. Close by saying you would report both gross source lift and net incremental lift: “If source A rises 10M sessions but total sessions rise only 1M while source B falls 9M, I would call most of the growth cannibalized.” If you had more time, you would check heterogeneous effects by new versus existing users, heavy versus light users, and short-term novelty versus longer-term retention.
A second angle
For Design and analyze A/B test with interference, the same causal logic applies, but the main threat is not omitted-variable bias; it is spillover between experimental units. A naive user-level A/B test might underestimate or overestimate impact if treated users create content, send messages, invite friends, or affect auction prices seen by control users. The candidate should define direct, indirect, and total effects, then choose cluster randomization, graph-cluster assignment, geo randomization, or an exposure model. The tradeoff is precision versus validity: user-level randomization gives huge sample size but biased estimates under interference, while cluster-level randomization reduces contamination but may have much larger standard errors. The analysis should use cluster-level variance estimation, randomization inference when clusters are few, and diagnostics for cross-cluster exposure.
Common pitfalls
Pitfall: Treating correlation as causation because one source grew while another declined.
A weak answer says, “Source A grew and total stayed flat, so it cannibalized.” A better answer explains the counterfactual: what would total and other sources have done absent the change? Seasonality, concurrent launches, ranking changes, or user mix shifts could produce the same pattern without cannibalization.
Pitfall: Reciting DiD without defending parallel trends.
Interviewers often push on why the control group is valid. Do not just write the DiD formula; explain how you would select controls, inspect pre-period trends, run placebo dates, and handle staggered adoption or heterogeneous effects.
Pitfall: Communicating only methods, not decisions.
A common depth mistake is listing “DiD, synthetic control, IV, regression discontinuity” without tying them to launch/no-launch guidance. Land the business interpretation: incremental lift, confidence interval, substitution pattern, segment heterogeneity, and what decision you would recommend under uncertainty.
Connections
Expect pivots into experiment design, especially cluster randomization, power, `MDE`, and guardrail metrics. If causal validity becomes the focus, the interviewer may ask about instrumental variables, regression discontinuity, synthetic control, or selection bias. For Meta-specific product contexts, be ready to connect this to ranking changes, ads measurement, notification experiments, marketplace liquidity, and recommender-system metric tradeoffs.
Further reading
-
Mostly Harmless Econometrics — Angrist and Pischke’s classic treatment of identification, DiD, IV, and regression discontinuity.
-
Causal Inference: The Mixtape — Practical explanations of modern causal designs, including DiD, synthetic control, event studies, and matching.
-
Difference-in-Differences with Multiple Time Periods — Callaway and Sant’Anna paper explaining why staggered rollout DiD needs more care than standard two-way fixed effects.
Practice questions
- Compare Instagram vs. Facebook using causal experimentsMeta · Data Scientist · Onsite · Medium
- Build DiD dataset with SQLMeta · Data Scientist · Technical Screen · Medium
- Evaluate brand ads effectiveness on social media causallyMeta · Data Scientist · Technical Screen · Medium
- Estimate revenue of organic shopping tabMeta · Data Scientist · Onsite · hard
- Prove source growth is cannibalization, not incrementalMeta · Data Scientist · Onsite · hard
- Handle sales pressure with analytical integrityMeta · Data Scientist · Technical Screen · medium
- Evaluate Chatbot's Retailer Value and Launch ViabilityMeta · Data Scientist · Onsite · hard
- Evaluating and launching Instagram StoriesMeta · Data Scientist · Onsite · medium
Related concepts
- Causal Inference And Difference-In-DifferencesAnalytics & Experimentation
- Difference-In-Differences And Quasi-ExperimentsAnalytics & Experimentation
- Causal InferenceAnalytics & Experimentation
- Causal Inference, Confounding, And MatchingAnalytics & Experimentation
- Causal Inference And IdentificationStatistics & Math
- Causal Inference And Quasi-ExperimentsAnalytics & Experimentation