Label new vs old users over time in SQL

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Meta.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Question

Define users as “new” during the first 30 days inclusive after their signup_date, and “old” thereafter. Produce per-user, per-day labels over a window, acknowledging that a single user naturally has different labels on different days. Use today = 2025-09-01.
Schema:
users(user_id INT, signup_date DATE)
activity(user_id INT, activity_date DATE) -- any daily activity counts as presence
Sample data:
users
+---------+-------------+
| user_id | signup_date |
+---------+-------------+
| 1 | 2025-07-25 |
| 2 | 2025-08-20 |
| 3 | 2025-08-01 |
| 4 | 2025-09-01 |
+---------+-------------+
activity
+---------+---------------+
| user_id | activity_date |
+---------+---------------+
| 1 | 2025-08-02 |
| 1 | 2025-08-26 |
| 2 | 2025-08-25 |
| 2 | 2025-09-01 |
| 3 | 2025-08-15 |
| 3 | 2025-09-01 |
| 4 | 2025-09-01 |
+---------+---------------+
Tasks:
A) Write SQL that outputs (user_id, d, label) for every day d in [2025-08-02, 2025-09-01], labeling “new” if d between signup_date and signup_date + 29 days inclusive, else “old”. Avoid assigning multiple labels for the same user-day. Prefer generating dates only where activity exists (join to activity) to reduce scan cost.
B) Using your labels, compute two aggregates for the last 30 days relative to today (window = [2025-08-03, 2025-09-01]): (i) active_new_users and active_old_users per day; (ii) a single summary row with total distinct users by label over the window. Explain any assumptions about time zones and inclusive/exclusive bounds.

PracHub · Accepted Answer

This question evaluates a candidate's ability to perform time-based user labeling and aggregation in SQL, testing skills such as date arithmetic, joins, deduplication of user-day records, and efficient generation of per-user-per-day labels.

Quick Overview

Quick Overview