Compute paid subscriber YoY counts by month
Company: Intuit
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write a single SQL query (PostgreSQL) that returns calendar-month counts of new paid subscribers starting from 2019-06-01 (inclusive) through the latest month present in the data, along with the same-month prior-year count, absolute YoY change, and YoY percent change. Define a “new paid subscriber” as a company with a non-null subscriptiondate falling in that month. Exclude companies that ever received a free subscription by anti-joining to Free_Subs. Treat dates as UTC and months as calendar months based on subscriptiondate. Include months with zero paid subscribers. Output columns: month_start (DATE, first day of month), new_paid_subscribers (INT), prior_year_new_paid_subscribers (INT), yoy_abs_change (INT), yoy_pct_change (NUMERIC, round to 1 decimal; null if prior year is 0 or null). Assume large tables; avoid scanning entire ranges unnecessarily.
Schema and small sample data:
company
+-----------+------------+-----------------+----------------+
| companyid | signupdate | subscriptiondate| terminationdate|
+-----------+------------+-----------------+----------------+
| 1 | 2019-05-20 | 2019-06-01 | 2020-04-15 |
| 2 | 2019-06-15 | 2019-06-30 | NULL |
| 3 | 2019-07-02 | 2019-07-05 | 2019-09-01 |
| 4 | 2020-06-10 | 2020-06-12 | NULL |
| 5 | 2020-06-20 | NULL | NULL |
| 6 | 2018-12-31 | 2019-06-10 | 2019-07-01 |
| 7 | 2019-06-01 | 2019-06-01 | NULL |
| 8 | 2020-06-01 | 2020-06-30 | 2020-07-15 |
| 9 | 2019-06-20 | 2019-07-01 | NULL |
| 10 | 2019-05-01 | 2019-06-01 | NULL |
+-----------+------------+-----------------+----------------+
Free_Subs
+-----------+
| companyid |
+-----------+
| 3 |
| 7 |
+-----------+
Requirements/edge cases:
- Count each company at most once via its single subscriptiondate; ignore rows with NULL subscriptiondate.
- Exclude any companyid found in Free_Subs from all months.
- Generate a month series from 2019-06-01 to max(date_trunc('month', subscriptiondate)) to show months with zero counts.
- The YoY comparison is month vs. the same calendar month 12 months earlier; if no prior-year month exists, set prior_year_new_paid_subscribers and YoY deltas to NULL.
- Add brief notes on indexes you would create to make this performant (no need to actually create them in SQL).
Quick Answer: This question evaluates SQL-based data manipulation skills including calendar-month aggregation, year-over-year comparisons, anti-joins to exclude records, NULL handling, and query performance considerations on large tables.