Compute monthly new subscribers and YoY deltas
Company: Intuit
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write a single SQL query that returns monthly counts of new subscribers starting at 2019-06, plus year-over-year (same-month prior year) comparisons. Treat a "new subscriber" as any company with a non-null subscription_date whose first (and only) subscription_date falls in that month. Include months with zero new subscribers. Return columns: month_start (DATE, first day of month), new_subscribers (INT), prior_year_new_subscribers (INT), yoy_abs_delta (INT), yoy_pct_delta (DECIMAL with 2 decimals; NULL if prior year is 0), and a flag is_partial_month (BOOLEAN) indicating whether the month is incomplete given the data watermark. The query must generate a proper month spine and perform a 12-month lag join, not rely on existing full-month rows. Assume the dataset may grow; do not hardcode end dates. Use subscription_date in UTC and explicitly truncate to month boundaries. Handle division-by-zero safely and ensure months with no data still appear with zeros. Schema to use and small sample data:
company
+------------+-------------+-------------------+------------------+
| company_id | signup_date | subscription_date | termination_date |
+------------+-------------+-------------------+------------------+
| 1 | 2019-05-20 | 2019-06-02 | 2020-01-10 |
| 2 | 2019-06-15 | 2019-06-20 | NULL |
| 3 | 2019-07-01 | 2019-07-05 | 2019-12-31 |
| 4 | 2020-06-10 | 2020-06-11 | NULL |
| 5 | 2020-06-30 | 2020-07-02 | NULL |
| 6 | 2020-07-15 | NULL | NULL |
+------------+-------------+-------------------+------------------+
Notes/constraints:
- Assume one row per company_id.
- Use the earliest available subscription_date per company_id (here it is unique already).
- Define data_watermark as the max(subscription_date) present; a month is partial if its last day > data_watermark.
- Start the spine at 2019-06-01 and end at the last month containing any subscription_date.
Quick Answer: This question evaluates proficiency in SQL time-series aggregation and cohort identification, including generating a month spine, computing year-over-year deltas, handling nulls and division-by-zero, and identifying partial months via a data watermark.