Cohort, Funnel, Retention, And Churn Analysis

What's being tested

Meta is testing whether you can turn ambiguous product health changes into a disciplined diagnostic analysis: define the right metric, decompose it by cohort/funnel/segment, separate real behavior from measurement artifacts, and propose credible causal tests. The interviewer is not looking for definitions of retention or churn; they are probing whether you can identify which slice, denominator, or time window changes the conclusion. This matters because products like `Facebook Lite`, account switching, creator monetization, and emerging-market launches can move global `DAU`, revenue, and engagement through subtle cohort mix shifts rather than obvious feature failures. A strong Data Scientist should reason from metrics to hypotheses to validation, while being explicit about uncertainty and decision thresholds.

Core knowledge

Cohort analysis groups users by a shared start event, usually install date, signup date, first session, first transfer, or first creator post. Always distinguish acquisition cohort from behavioral cohort: “users who installed in week W” answers different questions than “users who used account switching in week W.”
Retention should be defined with an exact return window and action. Common forms are `D1`, `D7`, `D28`, rolling retention, and bounded retention:
$D7 = \frac{\text{users active on day 7 after cohort entry}}{\text{users in cohort}}$
For `Facebook Lite`, “active” may mean app open, feed load, or meaningful session; these are not interchangeable.
Churn is the complement of retention only under a specific observation window. If $S(t)$ is the survival probability, churn by time $t$ is $1-S(t)$ . Use Kaplan-Meier survival curves when users enter at different dates or are right-censored; avoid treating unobserved future inactivity as churn.
Funnel analysis decomposes conversion into ordered steps: install $\rightarrow$ app open $\rightarrow$ login $\rightarrow$ feed render $\rightarrow$ meaningful interaction $\rightarrow$ return. Step conversion is $\frac{N_{i+1}}{N_i}$ , while overall conversion is $\frac{N_k}{N_0}$ . Drops at early steps often imply activation or instrumentation issues; later drops often imply product value or relevance.
Segment decomposition should start broad and move specific: geography, OS version, app version, device class, network type, language, acquisition channel, new versus resurrected users, and account type. Use weighted decomposition:
$\Delta M = \sum_s w_{1s}M_{1s}-\sum_s w_{0s}M_{0s}$
to separate within-segment metric changes from segment-mix changes.
Metric validation is part of DS diagnosis, but keep it analytical: compare independent signals such as `DAU`, sessions, feed impressions, login success, crash rate, and server-side events. If `DAU` drops but impressions and sessions do not, suspect definition, identity resolution, logging, or denominator changes before claiming product harm.
Count distributions matter for launch-period behavior. Transfers, sessions, and account switches are often zero-inflated and overdispersed; a Poisson model assumes $E[Y]=Var(Y)$ , while a negative binomial model handles $Var(Y)>E[Y]$ . Report median, p90, and zero share, not just mean.
Identity resolution is central when account switching rises while actives fall. A user-level metric may diverge from an account-level metric if one person uses multiple accounts or multiple people share a device. Clarify whether the unit is person, account, device, family of accounts, or session.
Causal inference starts after diagnosis narrows hypotheses. Prefer randomized experiments when possible; otherwise, consider difference-in-differences, matched cohorts, synthetic controls, or interrupted time series. For DiD, state the parallel-trends assumption and check pre-period trends before attributing retention movement to a launch or policy change.
Cannibalization analysis asks whether one source’s gains come at another source’s expense. For revenue shifts across creator surfaces, analyze total revenue, source-specific revenue, creator-level panel outcomes, and substitution patterns. A feature that increases `Reels` revenue but lowers `Feed` revenue may still be positive if total incremental revenue and creator retention improve.
Statistical uncertainty should be explicit. Use confidence intervals for cohort rates:
$SE(\hat{p})=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$
For many segment cuts, control false positives with Benjamini-Hochberg or prioritize effect size plus sample size over raw p-values.
Decision framing links metrics to action. A good answer ends with “if hypothesis X is true, I would do Y”: roll back a release, fix localization, change onboarding, run an A/B test, or monitor a guardrail. Avoid endless slicing without a next decision.

Worked example

For Diagnose Causes of Low Retention for FB Light, a strong candidate would first clarify the product scope: “Are we seeing lower `D1`, `D7`, or `D28` retention; is this among new installs, all users, or a country-specific cohort; and did acquisition volume, app version, or measurement definitions change?” They would state assumptions: treat retention as user-level bounded return after install, and focus on recent cohorts compared with historical baselines and comparable markets.

The answer skeleton should have four pillars. First, validate the metric by comparing client-side activity with server-side engagement signals such as sessions, feed loads, and login events. Second, decompose the retention drop by cohort date, country, device RAM, Android version, network quality, language, app version, and acquisition channel. Third, run funnel analysis from install through activation to identify whether users fail to open, log in, load feed, find friends, or return after a successful first session. Fourth, test causal hypotheses: app performance regression, poor localization, changed acquisition mix, notification delivery issues, or content relevance decline.

One explicit tradeoff is speed versus causal certainty. In an incident-style retention drop, you may first use segmented observational evidence to isolate a likely culprit, but you should avoid overclaiming causality until you can compare exposed versus unexposed cohorts or run a rollback/A-B test. A strong close would be: “If I had more time, I would build a cohort survival view by install week and market, estimate the contribution of acquisition mix versus within-segment degradation, and propose one targeted experiment or rollback based on the largest validated driver.”

A second angle

For Analyze Revenue Shifts to Identify Cannibalization Effects, the same cohort and decomposition logic applies, but the unit and outcome change. Instead of asking whether users return, you ask whether creators or advertisers shifted revenue-producing activity from one surface to another and whether total revenue increased. The key framing is panel-based: compare the same creators before and after launch, segment by prior surface usage, and separate new incremental creators from existing creators reallocating activity. The causal challenge is stronger because source-level revenue can move in opposite directions while aggregate revenue is flat, so you need counterfactual reasoning rather than a simple funnel drop diagnosis. A good answer would explicitly track total revenue, source mix, creator retention, and heterogeneity across creator segments.

Common pitfalls

Pitfall: Treating aggregate retention as the truth.

A tempting answer is “`D7` fell 5%, so the product got worse.” That misses Simpson’s paradox: retention may be stable within each country but fall because acquisition shifted toward lower-retention markets or channels. A better answer decomposes aggregate movement into segment mix and within-segment changes before diagnosing cause.

Pitfall: Jumping to product narratives before validating measurement.

Candidates often say “users dislike the new onboarding” without checking whether active-user definitions, account identity, app versions, or logging coverage changed. At Meta scale, a metric discontinuity can be caused by a definition change, rollout mismatch, or duplicate-account handling. Land better by saying, “I’d first triangulate with independent engagement signals, then move to behavioral hypotheses.”

Pitfall: Over-slicing without prioritization.

It is easy to list 20 dimensions and sound busy. Interviewers prefer a ranked plan: start with time, geography, app version, acquisition channel, and device/network constraints because they map to plausible mechanisms and actions. For every slice, explain what result would confirm or falsify a hypothesis.

Connections

Interviewers may pivot from here into A/B testing, especially how to design an experiment after diagnosing a retention issue. They may also probe causal inference, including difference-in-differences, synthetic controls, or matched cohorts when randomization is unavailable. Related areas include metric design, survival analysis, segmentation strategy, and product anomaly diagnosis.

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Practice questions

Related concepts