Define and compute retention and churn precisely
Company: DoorDash
Role: Data Scientist
Category: Statistics & Math
Difficulty: hard
Interview Round: Onsite
Define retention and churn for a transactional consumer app and show how you would compute them correctly:
1) Choose precise definitions for cohorts (signup vs first purchase), activity (active if 1+ order in period), retention types (N-day, week N, rolling, bracket), and churn (no activity for K consecutive periods). Justify choices based on decision use-cases.
2) Provide formulas for cohort retention and churn rates using proper risk sets, handling right-censoring and delayed conversion. Explain pitfalls such as survivorship bias, Simpson’s paradox, and seasonality.
3) Describe how to measure long-term retention impact of a treatment (e.g., the 20% discount) using survival analysis: define time-to-churn, hazard, and cumulative incidence; specify how to compare curves (log-rank or stratified tests) and adjust for covariates.
4) Show how rolling retention can disagree with strict cohort retention and how you would reconcile for executives. Include an example with made-up numbers to illustrate the difference and compute both correctly.
5) Explain how you would set windows (washout, observation, and attribution) and how these choices affect experiment power and bias.
Quick Answer: This question evaluates a Data Scientist's competency in statistical measurement of user retention and churn, covering cohort definition, activity rules, risk-set-aware retention and churn formulas, censoring and delayed conversion handling, and survival-analysis concepts like time-to-churn, hazard, and cumulative incidence.