Cohort, Retention, Funnel And Product Metrics
Asked of: Data Scientist
Last updated

What's being tested
Interviewers are probing whether you can turn messy user-event data into defensible cohort, retention, funnel, and product metric conclusions. For TikTok, this matters because small changes in posting, viewing, shopping, or ad-load behavior can affect creator supply, viewer engagement, and monetization simultaneously. A strong Data Scientist should define the right denominator, align events in time, segment users meaningfully, and distinguish metric movement from causal impact. Expect to explain both the analysis logic and the business interpretation, not just write a query.
Core knowledge
-
Cohort analysis groups users by a shared starting condition, usually first registration, first post, first purchase, or first exposure date. The most common cohort key is
DATE(MIN(event_ts))peruser_id; always clarify whether the cohort is based on signup, first activity, or experiment assignment. -
Retention rate measures whether users return after an anchor event. For day- retention:
Clarify whether “active” means opening the app, watching, posting, liking, purchasing, or any qualifying event.
-
Rolling retention and bounded retention answer different questions. Day-7 exact retention counts users active exactly on day 7, while 7-day rolling retention counts users active any time from day 1 through day 7.
TikTok-style engagement analyses often need both because habit formation and occasional usage behave differently. -
Funnel analysis tracks ordered conversion through steps such as
view_item→add_to_cart→purchaseorvideo_view→profile_visit→follow. Define whether the funnel is user-level, session-level, or item-level; conversion rates change dramatically depending on that unit. -
Temporal ordering is essential in funnels. Use event timestamps and patterns like
ROW_NUMBER() OVER (PARTITION BY user_id, product_id ORDER BY event_ts)to deduplicate repeated events, then require step to occur after step . Counting unordered events creates inflated conversions. -
Metric decomposition separates topline movement into interpretable components. For ad revenue, a useful identity is:
This lets you diagnose whether revenue changed because of traffic, ad load, auction pricing, or engagement.
-
DAU growth vs monetization tradeoffs require both guardrail and objective metrics. A growth feature might increase
DAUbut reduceARPDAU, session depth, or creator posting. A monetization change might increasead_revenuebut hurtwatch_time,retention, or long-term user value. -
Segmentation is not optional. Retention and funnel behavior differ by new vs existing users, creator vs viewer, geography, device, traffic source, content vertical, and user maturity. A flat average can hide Simpson’s paradox, especially when traffic mix shifts across regions or acquisition channels.
-
Causal inference matters whenever you interpret metric changes. If a cohort has higher retention after a launch, ask whether it was exposed to a treatment, acquired through a different channel, or affected by seasonality. Prefer randomized
A/Btests; otherwise consider difference-in-differences, matching, or regression adjustment. -
Censoring and incomplete windows can silently bias retention. If today is
2026-05-23, users acquired on2026-05-20cannot yet have day-7 retention. Exclude immature cohorts from day- calculations or mark them incomplete instead of treating missing activity as non-retention. -
Statistical uncertainty should accompany metric reads. For a binary retention metric, a rough standard error is ; for funnel rates, compare proportions with confidence intervals or logistic regression. With very large samples, focus on practical significance, not only tiny
p_values. -
Deduplication and event definition are analysis-layer responsibilities. If users can post multiple times or fire duplicate click events, decide whether to count distinct users, distinct sessions, or total events. For example,
COUNT(DISTINCT user_id)answers active users;COUNT(*)answers total activity volume.
Worked example
For Analyze Trade-off Between DAU Growth and Ad Revenue, a strong candidate would start by clarifying the setup: “Are we evaluating an experiment, an observed trend, or a proposed product change? Are DAU and ad revenue measured globally or by market, and over what time horizon?” Then they would define the primary metrics: DAU, ad_revenue, ARPDAU, retention, watch_time, ad_impressions_per_user, and possibly creator-side metrics if the feature affects posting supply.
The answer should be organized around four pillars. First, decompose revenue into traffic, engagement, ad inventory, fill rate, and price, so the tradeoff is not treated as one black-box number. Second, segment by user maturity, geography, acquisition source, and engagement level to see whether growth comes from low-monetizing or high-retaining users. Third, evaluate causality with an A/B test if available, using DAU or retention as engagement metrics and revenue per eligible user as monetization metrics. Fourth, make a decision framework: launch if long-term engagement gain outweighs short-term revenue loss, but block if guardrails like day-7 retention, session length, or ad fatigue degrade materially.
One explicit tradeoff to flag is short-term versus long-term value. Increasing ad load may raise same-day revenue while reducing future retention; reducing ad load may lower ARPDAU today but increase future LTV. A strong close would be: “If I had more time, I’d estimate cumulative 7-, 14-, and 28-day revenue and retention by cohort, not just same-day DAU, because this decision depends on lifetime value rather than a single-day metric.”
A second angle
For Calculate User Registration Date and 7-Day Retention Rate, the same core concept becomes more operational and cohort-oriented. Instead of debating business tradeoffs, the candidate must define each user’s registration or first activity date, assign them to a cohort, and check whether they performed a qualifying action exactly seven days later or within a defined 7-day window. The main constraint is precision: registration_date, timezone, event type, and inclusion of incomplete cohorts must be handled consistently. The interviewer is likely testing whether you can translate a product definition into a reproducible metric while avoiding denominator leakage. The best answer still includes interpretation: day-7 retention is not just a query output; it tells whether new users are forming a habit.
Common pitfalls
Pitfall: Treating event counts as user counts.
A tempting wrong answer is to compute retention as COUNT(posts_on_day_7) / COUNT(posts_on_day_0). That measures posting volume, not retained users, and heavy posters will dominate the metric. A better answer uses COUNT(DISTINCT user_id) for retention and separately reports posts per retained user if activity intensity matters.
Pitfall: Ignoring time alignment and cohort maturity.
Candidates often include users who have not yet had enough time to reach day 7, which mechanically depresses recent cohort retention. State that you would filter to cohorts with cohort_date <= current_date - interval '7 days' and align timestamps to the product’s reporting timezone before aggregating.
Pitfall: Jumping to a launch recommendation without diagnosing the metric movement.
For a DAU versus ad revenue question, saying “choose revenue because it is business-critical” or “choose DAU because growth matters” is too shallow. A stronger response decomposes the metrics, checks user segments, estimates long-term impact, and frames the decision around objective metrics plus guardrails.
Connections
Interviewers may pivot from here into experiment design, especially choosing primary metrics and guardrails for an A/B test. They may also ask about causal inference, ranking/recommender evaluation, or metric anomaly diagnosis, such as explaining why DAU rose while retention fell.
Further reading
-
Trustworthy Online Controlled Experiments — practical reference for experiment metrics, guardrails, and decision-making.
-
Causal Inference: The Mixtape — useful grounding for interpreting non-randomized cohort and retention differences.
-
Lean Analytics — accessible product-metric framing across growth, retention, and monetization.
Featured in interview prep guides
Practice questions
- Define Ultra success metrics and detect suspicious transactionsTikTok · Data Scientist · Technical Screen · easy
- Analyze shopping funnel with joins and windowsTikTok · Data Scientist · Technical Screen · Medium
- Design robust metrics for a feature launchTikTok · Data Scientist · Technical Screen · hard
- Calculate Day-7 Retention Rate from User Post DataTikTok · Data Scientist · Technical Screen · Medium
- Analyze Posting Behavior by Cohort and DateTikTok · Data Scientist · Technical Screen · Medium
- Evaluate Cohort Posting Patterns Using Metrics and TestsTikTok · Data Scientist · Technical Screen · medium
- Design Metrics for Content Moderation and Chatbot EvaluationTikTok · Data Scientist · Technical Screen · medium
- Analyze Trade-off Between DAU Growth and Ad RevenueTikTok · Data Scientist · Technical Screen · medium
- Diagnose Decline in User Engagement and Experience QualityTikTok · Data Scientist · Technical Screen · medium
- Determine Metrics for Evaluating Homepage Recommendation CarouselTikTok · Data Scientist · Onsite · medium
- Track Key Metrics for Apple's New Phone LaunchTikTok · Data Scientist · Onsite · medium
- Diagnose Search Issues with Relevant Metrics and SolutionsTikTok · Data Scientist · Onsite · medium
Related concepts
- Product Metrics, Funnels, And SegmentationAnalytics & Experimentation
- Product Metrics, Guardrails, And RetentionAnalytics & Experimentation
- CTR And Engagement MetricsAnalytics & Experimentation
- Product Metric Design And Diagnostic Deep DivesAnalytics & Experimentation
- Product Metrics, Funnels, And KPI DiagnosisAnalytics & Experimentation
- Product Metrics, Guardrails, And Launch Decisions