PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Netflix

Aggregate D1 retention cohorts in SQL

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in time-series cohort analysis and SQL-based data manipulation, specifically measuring DAU, new-user cohorts, D1 retention and revenue-per-user while handling deduplication and user filtering.

  • Medium
  • Netflix
  • Data Manipulation (SQL/Python)
  • Data Scientist

Aggregate D1 retention cohorts in SQL

Company: Netflix

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Today is 2025-09-01. Using SQL (optionally outline a pandas approach too), compute daily engagement and D1 retention for the last 7 days (2025-08-26 through 2025-09-01). Exclude any user_id beginning with 'bot_'. Use the following schema and sample data. Schema: users(user_id STRING, signup_date DATE, country STRING) events(user_id STRING, event_time TIMESTAMP, event_type STRING, product_id STRING) purchases(user_id STRING, purchase_time TIMESTAMP, order_id STRING, amount DECIMAL(10,2)) Sample tables (UTC times): users | user_id | signup_date | country | | u1 | 2025-08-25 | US | | u2 | 2025-08-26 | US | | u3 | 2025-08-26 | IN | | u4 | 2025-08-31 | US | | u5 | 2025-09-01 | US | | bot_1 | 2025-08-26 | US | events | user_id | event_time | event_type | product_id | | u1 | 2025-08-26 09:00:00 | app_open | NULL | | u1 | 2025-08-27 10:00:00 | app_open | NULL | | u2 | 2025-08-26 12:00:00 | app_open | NULL | | u2 | 2025-08-27 13:00:00 | view | P1 | | u2 | 2025-08-28 12:00:00 | app_open | NULL | | u3 | 2025-08-26 15:00:00 | app_open | NULL | | u3 | 2025-08-27 16:00:00 | app_open | NULL | | u4 | 2025-09-01 08:00:00 | app_open | NULL | | u5 | 2025-09-01 09:00:00 | app_open | NULL | | bot_1 | 2025-08-26 10:00:00 | app_open | NULL | purchases | user_id | purchase_time | order_id | amount | | u1 | 2025-08-27 11:00:00 | o1 | 9.99 | | u2 | 2025-08-28 11:30:00 | o2 | 4.00 | | u3 | 2025-08-29 14:00:00 | o3 | 2.50 | | u4 | 2025-09-01 09:30:00 | o4 | 1.00 | | u5 | 2025-09-02 10:00:00 | o5 | 5.00 | | bot_1 | 2025-08-26 10:30:00 | o6 | 100.00 | Task: Produce a single SQL query returning one row per day d with the following columns: (1) day (DATE), (2) dau: distinct users with any event on day d, (3) new_users: users whose first-ever event_time is on day d, (4) d1_retention_rate: among new_users on day d, fraction with any event on day d+1, (5) revenue_per_dau: total purchase amount on day d divided by dau. Notes: treat days in UTC; deduplicate exact-duplicate events via SELECT DISTINCT user_id, event_time, event_type, product_id before aggregating; exclude 'bot_%' users from all metrics. Provide the SQL and, briefly, how you would compute the same using pandas (high-level steps).

Quick Answer: This question evaluates a candidate's competency in time-series cohort analysis and SQL-based data manipulation, specifically measuring DAU, new-user cohorts, D1 retention and revenue-per-user while handling deduplication and user filtering.

Related Interview Questions

  • Write SQL for DAU and first-purchase conversion - Netflix (Medium)
  • Write SQL for rolling frequency caps - Netflix (Medium)
  • Transform flat keys into nested dictionary - Netflix (Medium)
  • Analyze Retention Metrics Using SQL and Python - Netflix (Medium)
  • Determine Maximum Consecutive Order Days Per User - Netflix (Medium)
Netflix logo
Netflix
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
6
0

Today is 2025-09-01. Using SQL (optionally outline a pandas approach too), compute daily engagement and D1 retention for the last 7 days (2025-08-26 through 2025-09-01). Exclude any user_id beginning with 'bot_'. Use the following schema and sample data.

Schema: users(user_id STRING, signup_date DATE, country STRING) events(user_id STRING, event_time TIMESTAMP, event_type STRING, product_id STRING) purchases(user_id STRING, purchase_time TIMESTAMP, order_id STRING, amount DECIMAL(10,2))

Sample tables (UTC times): users | user_id | signup_date | country | | u1 | 2025-08-25 | US | | u2 | 2025-08-26 | US | | u3 | 2025-08-26 | IN | | u4 | 2025-08-31 | US | | u5 | 2025-09-01 | US | | bot_1 | 2025-08-26 | US |

events | user_id | event_time | event_type | product_id | | u1 | 2025-08-26 09:00:00 | app_open | NULL | | u1 | 2025-08-27 10:00:00 | app_open | NULL | | u2 | 2025-08-26 12:00:00 | app_open | NULL | | u2 | 2025-08-27 13:00:00 | view | P1 | | u2 | 2025-08-28 12:00:00 | app_open | NULL | | u3 | 2025-08-26 15:00:00 | app_open | NULL | | u3 | 2025-08-27 16:00:00 | app_open | NULL | | u4 | 2025-09-01 08:00:00 | app_open | NULL | | u5 | 2025-09-01 09:00:00 | app_open | NULL | | bot_1 | 2025-08-26 10:00:00 | app_open | NULL |

purchases | user_id | purchase_time | order_id | amount | | u1 | 2025-08-27 11:00:00 | o1 | 9.99 | | u2 | 2025-08-28 11:30:00 | o2 | 4.00 | | u3 | 2025-08-29 14:00:00 | o3 | 2.50 | | u4 | 2025-09-01 09:30:00 | o4 | 1.00 | | u5 | 2025-09-02 10:00:00 | o5 | 5.00 | | bot_1 | 2025-08-26 10:30:00 | o6 | 100.00 |

Task: Produce a single SQL query returning one row per day d with the following columns: (1) day (DATE), (2) dau: distinct users with any event on day d, (3) new_users: users whose first-ever event_time is on day d, (4) d1_retention_rate: among new_users on day d, fraction with any event on day d+1, (5) revenue_per_dau: total purchase amount on day d divided by dau. Notes: treat days in UTC; deduplicate exact-duplicate events via SELECT DISTINCT user_id, event_time, event_type, product_id before aggregating; exclude 'bot_%' users from all metrics. Provide the SQL and, briefly, how you would compute the same using pandas (high-level steps).

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Netflix•More Data Scientist•Netflix Data Scientist•Netflix Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.