Assume you are working with a food delivery dataset.
Tables (schemas)
users
-
user_id INT PRIMARY KEY
-
annual_income_usd INT
-
signup_ts TIMESTAMP
restaurants
-
restaurant_id INT PRIMARY KEY
-
neighborhood_income_usd INT -- e.g., median income of the restaurant’s census tract
orders
-
order_id INT PRIMARY KEY
-
user_id INT REFERENCES users(user_id)
-
restaurant_id INT REFERENCES restaurants(restaurant_id)
-
order_ts TIMESTAMP -- UTC
-
order_amount_usd NUMERIC(10,2)
-
status STRING -- e.g., 'completed', 'canceled'
Definitions / assumptions
-
Only include orders with status = 'completed'.
-
“Month” means calendar month in UTC: DATE_TRUNC('month', order_ts).
-
When ties occur for “top spender”, return the smallest user_id.
Tasks
-
Monthly spend > $100:
For each user-month, compute total spend. Return rows where the user’s monthly spend is strictly greater than 100.
-
Output: month, user_id, monthly_spend
-
Top spender each month among (1):
Considering only the user-months returned by (1), find the top spender for each month.
-
Output: month, user_id, monthly_spend
-
Restaurant monthly spend change (MoM):
For each restaurant and month, compute monthly revenue and the month-over-month change (absolute and percent) versus the previous month.
-
Output: restaurant_id, month, monthly_revenue, prev_month_revenue, mom_change, mom_change_pct
-
Only output months where a previous month exists for that restaurant.
-
Share of customers ordering from bottom-income restaurants:
Define “bottom 25% income restaurants” as restaurants whose neighborhood_income_usd is in the bottom quartile across all restaurants (using NTILE/percentile logic).
For each month, compute:
-
numerator = number of distinct users who placed ≥1 completed order in that month at a bottom-quartile restaurant
-
denominator = number of distinct users who placed ≥1 completed order in that month at any restaurant
-
ratio = numerator / denominator
Output: month, numerator_users, denominator_users, ratio