Write monthly new-vs-returning requests SQL
Company: Thumbtack
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Given the schema and sample data below, write a single PostgreSQL query (no dynamic SQL) that returns, for every calendar month present in requests, the counts and shares of requests from NEW vs RETURNING users. Definitions: A user's FIRST_REQUEST_MONTH is date_trunc('month', MIN(request_ts)) over all their requests. A request is NEW if date_trunc('month', request_ts) = FIRST_REQUEST_MONTH for that user; otherwise it is RETURNING. Output columns: month (YYYY-MM), new_request_count, returning_request_count, new_share_pct (0–100 with two decimals), returning_share_pct. Requirements: (1) Include months even if one class has zero; (2) Avoid double counting; (3) Treat timestamps as UTC; (4) Do not assume contiguous months; (5) Be robust if a user exists in users but has no requests (they should not appear unless the month has requests from someone); (6) Order by month ascending. Also, show what your query would output on the sample data to verify understanding.
Schema:
users(user_id INT PRIMARY KEY, signup_dt DATE)
requests(request_id INT PRIMARY KEY, user_id INT REFERENCES users(user_id), request_ts TIMESTAMP NOT NULL)
Sample tables:
users
+---------+------------+
| user_id | signup_dt |
+---------+------------+
| 1 | 2025-01-10 |
| 2 | 2025-01-20 |
| 3 | 2025-02-02 |
| 4 | 2025-02-25 |
+---------+------------+
requests
+------------+---------+---------------------+
| request_id | user_id | request_ts |
+------------+---------+---------------------+
| 101 | 1 | 2025-01-11 08:00:00 |
| 102 | 1 | 2025-02-01 10:00:00 |
| 103 | 2 | 2025-01-25 12:00:00 |
| 104 | 2 | 2025-02-15 09:00:00 |
| 105 | 3 | 2025-02-10 14:00:00 |
| 106 | 3 | 2025-02-28 16:00:00 |
| 107 | 1 | 2025-03-05 11:00:00 |
| 108 | 4 | 2025-03-01 07:30:00 |
+------------+---------+---------------------+
Quick Answer: This question evaluates a candidate's ability to perform time-based SQL aggregation and classification, covering window functions to compute each user's first request month, conditional counts for NEW vs RETURNING requests, percentage formatting, and correct handling of UTC timestamps and edge cases.