Cohort, Funnel, And Retention Analysis

What's being tested

Interviewers are testing whether you can turn behavioral event logs into decision-quality product insights: define the right cohort, construct the right denominator, diagnose where users drop off, and distinguish real retention change from instrumentation or mix shift. At Meta, this matters because small percentage-point changes in activation, session frequency, or creator retention can affect millions of users and downstream ecosystem health. The interviewer is probing whether you can reason from first principles under messy data constraints, not whether you can recite “D1/D7 retention” definitions. Strong answers connect metrics to product mechanisms, statistical validity, and actionable next steps.

Core knowledge

Cohort definition is the foundation. A cohort is a group sharing an entry event and timestamp, e.g. “users who completed signup in calendar week $w$ .” Define eligibility, entry event, geography, platform, and whether users can enter once or multiple times. Ambiguous cohorts create misleading retention curves.
Retention is a conditional probability. Classic day- $n$ retention is $R_n=\frac{\#\text{users active on day }n\text{ after cohort entry}}{\#\text{eligible users in cohort}}.$ Rolling retention uses “active on or after day $n$ ,” while bracket retention uses windows such as days 7–13. Choose based on product usage frequency.
Meta-style products often need frequency-aware retention. Daily active apps may use D1/D7/D28 retention, but lower-frequency surfaces like Marketplace selling, Groups posting, or Reels creation may need weekly or monthly active retention. A bad metric can falsely label healthy infrequent behavior as churn.
Funnels measure ordered conversion through product steps. For steps $S_1 \rightarrow S_2 \rightarrow \dots \rightarrow S_k$ , step conversion is $C_i=\frac{\#\text{users reaching }S_i}{\#\text{users reaching }S_{i-1}}.$ Overall conversion is $\#S_k/\#S_1$ . Enforce ordering and time windows, or users who skip/retry steps will distort results.
Use the right event grain. Event logs are many-to-one with users, so deduplicate per user per step unless analyzing attempts. For example, COUNT(DISTINCT user_id) answers user conversion; COUNT(*) answers event volume. In Hive/Presto/Spark, approximate distinct counts like HyperLogLog help at billions of rows.
Beware censoring in young cohorts. A cohort that joined three days ago cannot have D7 retention. Either exclude incomplete cohorts, mark values as censored, or use survival analysis. Kaplan-Meier estimates retention under right-censoring: $\hat S(t)=\prod_{i:t_i\le t}\left(1-\frac{d_i}{n_i}\right).$
Segment before explaining. Overall retention can move because of product experience changes or because cohort mix changed. Slice by country, acquisition channel, app version, device class, language, age of account, new vs resurrected users, and traffic source. Simpson’s paradox is common in global products.
Separate instrumentation drops from behavioral drops. Check event volume, logging schema changes, app releases, bot filtering, privacy consent changes, delayed ingestion, and client/server parity. If iOS events dropped 30% exactly after a release while server-side sessions are flat, it is likely logging, not churn.
Funnels need latency windows and attribution rules. Decide whether conversion must happen in one session, within 24 hours, or within seven days. For Meta Ads onboarding, conversion from account creation to first campaign may take days; for feed interaction, seconds or minutes may be appropriate.
Retention curves reveal more than point estimates. Plot $R_1, R_7, R_{14}, R_{28}$ by cohort start date. A steep early drop suggests activation/onboarding issues; a parallel downward shift suggests acquisition quality; a late divergence suggests long-term value, notification fatigue, or content supply problems.
Quantify uncertainty and practical significance. For a retention proportion $\hat p$ , approximate standard error is $\sqrt{\hat p(1-\hat p)/n}$ . With very large Meta-scale $n$ , tiny differences become statistically significant, so discuss effect size, guardrails, and business impact, not just $p$ -values.
SQL implementation should be explicit. A typical pattern is: build cohort table, build activity table, join on user_id, compute DATEDIFF(activity_date, cohort_date), aggregate distinct users by day offset. For massive datasets, pre-aggregate daily user activity and partition by date to avoid scanning raw logs repeatedly.

Worked example

“How would you analyze a drop in Day-7 retention?”

In the first 30 seconds, a strong candidate would clarify the product surface, user population, retention definition, time period, and whether the drop is relative to prior cohorts, a forecast, or an experiment control. They might say: “I’ll assume D7 retention means users who joined on day 0 and had at least one qualifying active event exactly seven days later; I’d first verify the metric and then decompose the drop.” The answer should be organized around four pillars: metric validation, cohort construction, segmentation/diagnosis, and action or experimentation.

For metric validation, check whether the denominator changed, whether D7 cohorts are fully mature, and whether app releases or logging changes affected the active event. For cohort construction, ensure day-0 users are comparable: same signup definition, no duplicate accounts, exclude test users and bots, and handle time zones consistently. For segmentation, compare retention curves by acquisition channel, country, platform, app version, and activation actions taken in the first session, such as adding friends, following creators, joining groups, or watching reels. One design decision to flag explicitly is whether to use exact-day D7 retention or bracketed days 6–8; exact-day is cleaner but noisy for users with weekly or irregular usage patterns. The close should propose next steps: “If I had more time, I’d identify the largest contributing segment, inspect upstream activation funnels, and design an experiment or product fix targeted to the segment driving the decline.”

A second angle

“Where are users dropping off in a signup or onboarding funnel?”

The same reasoning applies, but the primary object is an ordered path rather than a time-indexed cohort curve. The first clarification is whether the funnel is strict-order, whether steps can be skipped, and what conversion window defines success. Instead of asking “are users active seven days later,” you ask “conditional on reaching step $i$ , what fraction reaches step $i+1$ , and how long does it take?” The biggest constraint is attribution: a user may start signup on mobile, verify email on web, and complete profile later, so identity stitching and cross-device event consistency become central. A strong answer still segments heavily, but now by browser, locale, network quality, error code, permission prompt, and app version to isolate the broken step.

Common pitfalls

Analytical mistake: using the wrong denominator. A tempting answer is “D7 retention fell because active users fell,” but retention should be computed over a fixed entry cohort, not all users active that week. A better answer distinguishes cohort size, retained users, and overall DAU, then checks whether the denominator composition changed.

Communication mistake: jumping straight to SQL or charts. Many candidates start with SELECT COUNT(DISTINCT user_id) before defining what counts as retained, activated, or converted. Interviewers expect a crisp metric definition first, then a plan for validation, segmentation, and interpretation.

Depth mistake: treating correlation as diagnosis. Saying “Android retention is lower, so Android caused the drop” is incomplete. A stronger answer asks whether Android mix increased, whether a release introduced logging loss, whether acquisition campaigns shifted toward lower-intent users, and whether the Android-specific effect remains after controlling for country and channel.

Connections

Expect pivots into experimentation, especially how to measure whether an onboarding change improves retention without hurting downstream quality. Interviewers may also move toward causal inference, metric design, time-series anomaly detection, or SQL/data modeling for event logs. If the conversation gets deeper, be ready to discuss survival analysis, heterogeneous treatment effects, and guardrail metrics such as sessions, spam reports, notification opt-outs, and long-term engagement.