Count weekly customers with ≥$1000 YTD spend
Company: Databricks
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
##### Question
You are given a transaction-level table and must compute a weekly time series of how many customers have reached a year-to-date spending threshold. The problem can be solved in **SQL** or **Python (pandas)**.
## Table
**transactions**
- `date` (DATE) — transaction date (assume UTC, calendar dates)
- `transaction_id` (STRING/INT) — primary key
- `customer_id` (STRING/INT)
- `dollars` (NUMERIC) — non-negative spend amount in USD
## Definitions
- **Week**: a calendar week starting on Monday. Report each week by its `week_start_date` (the Monday of the week). You may pick a different, consistent week boundary if you state it.
- **Year-to-date (YTD) spend as of a week**: for a given `customer_id` in calendar year `Y`, the sum of `dollars` from **Jan 1 of year Y** up to and including the **last day of that week**.
## Task
For each week present in the data, compute the **number of distinct customers** whose **YTD spend as of that week** is **at least $1000**.
1. **Core metric.** For every week, count the distinct customers whose cumulative spend from the start of that calendar year through the end of the week is ≥ $1000.
2. **Multi-year handling.** If the data spans multiple calendar years, compute the metric **independently per year** — YTD resets to $0 on Jan 1 of each year. Include the `year` in the output.
3. **Ordering.** Order the results by `year`, then `week_start_date`.
## Output
Return a table with:
- `year` (INT)
- `week_start_date` (DATE)
- `num_customers_ge_1000_ytd` (INT)
(If you assume a single year, the original minimal output of `week_start` and `num_customers_ge_1000_ytd` is also acceptable — the `year` column is what generalizes it to multiple years.)
Quick Answer: Databricks data-scientist technical-screen question: given a transactions table, compute for each calendar week the number of distinct customers whose year-to-date spend reaches at least $1000, resetting YTD each January and supporting multiple years. It tests running-total window functions, per-customer-then-threshold logic, Monday-week bucketing, and distinct counts in SQL or pandas.