Compute Cohort Retention Rate

Q: Compute Cohort Retention Rate

This question evaluates a data scientist's competency in cohort-based retention analysis, time-based data manipulation, and aggregation using Data Manipulation (SQL/Python) techniques.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

You are given two tables:

users(user_id BIGINT, signup_ts TIMESTAMP) — one row per user.
user_activity(user_id BIGINT, activity_ts TIMESTAMP, event_name VARCHAR) — zero or more activity rows per user.

Assume all timestamps are in UTC. Define a user's cohort as the calendar month of signup_ts. A user is retained in month k if they have at least one row in user_activity during the calendar month that is k months after their cohort month. Month 0 is the signup month.

Write a SQL query to compute cohort-based monthly retention for k = 0, 1, 2, 3, 4, 5, 6.

Return these columns:

cohort_month — first day of the signup month
months_since_signup — integer from 0 to 6
cohort_size — number of users in the cohort
retained_users — number of distinct users active in that relative month
retention_rate — retained_users / cohort_size as a decimal

Make sure each user is counted at most once per cohort-month and relative month.

Compute Cohort Retention Rate

Quick Overview

Comments (0)