This question evaluates proficiency in time-series aggregation, cumulative year-to-date computations, distinct-count deduplication, and use of SQL or pandas for data manipulation and reporting.
You are given a transaction-level table and need to compute a weekly time series.
transactions
date
(DATE) — transaction date (assume UTC and calendar dates)
transaction_id
(STRING/INT) — primary key
customer_id
(STRING/INT)
dollars
(NUMERIC) — non-negative transaction amount in USD
customer_id
and a given week
w
within year
Y
, the sum of
dollars
from
Jan 1 of year Y
up to and including the last day of week
w
.
For each week present in the data, compute the number of distinct customers whose YTD spend as of that week is at least $1000.
Return a table with:
week_start
(DATE)
num_customers_ge_1000_ytd
(INT)
You may solve this in SQL or Python (pandas).