Retention, Cohort, And Funnel Analysis
Asked of: Data Scientist
Last updated

What's being tested
Ability to operationalize retention and funnel analyses: pick clear metrics, define cohorts and windows, implement reliably (SQL/analytics tools), and interpret results accounting for censoring, confounders, and instrumentation errors.
Core knowledge
- Day-N retention = % users in acquisition cohort with at least one qualifying event on day N.
- Cohort keys: first_touch_date, acquisition_channel, experiment_arm; cohort granularity impacts noise.
- Survival/Kaplan–Meier handles right-censoring for long-term retention estimation.
- Funnel basics: ordered steps, conversion rate per step, time-bounded vs. lifetime funnels.
- Instrumentation pitfalls: duplicate events, missing user_id, client-side sampling, event deduplication.
- SQL patterns: window functions for first_event_date, cohort pivot via GROUP BY + DATE_DIFF, dedupe with ROW_NUMBER().
- Statistical tests: bootstrap or proportion z-test for retention; account for dependence across time points.
Worked example
Example question: "Design a cohort analysis to measure 7-day retention after a new onboarding flow." Start by stating the objective (7-day active retention post-onboarding). Define cohort (users with first_session_date = acquisition_date) and primary event (qualifying session or key action). Specify window (days 0–7), aggregation level (daily cohorts or weekly), and exclusions (bots, internal users). Outline implementation: identify first_session_date via window functions, join to event table to flag day-wise activity, pivot to retention table. Finally, mention checks: ensure instrumentation completeness, compare to historical baseline, and run statistical comparison if this was a treated cohort (A/B).
A common pitfall
A tempting but wrong approach is to compute retention as percent of all historical users or average events per user rather than cohort-based day-N retention. This dilutes the signal with resurrected or ineligible users and mixes acquisition timing, leading to misleading apparent improvements. Also watch for right-censoring when recent cohorts appear worse simply because they haven’t had time to show activity.
Further reading
- Lean Analytics (Croll & Yoskovitz) — pragmatic retention and cohort techniques.
- Kohavi et al., "Trustworthy Online Controlled Experiments" — for connecting experiments to retention metrics.