This question evaluates SQL data manipulation and time-series analytics skills, focusing on interval overlap logic, aggregation of distinct entities, and cumulative revenue calculations.
You work on a theme marketplace. Some shops install pirated themes instead of paying for official themes.
Assume all timestamps are in UTC.
shopsshop_id
BIGINT PRIMARY KEY
created_at
TIMESTAMP
theme_installsRepresents a shop having a theme active for a time range.
shop_id
BIGINT (FK →
shops.shop_id
)
theme_id
BIGINT
valid_from
TIMESTAMP
valid_to
TIMESTAMP NULL — NULL means the install is still active
pirated_themesList of themes known to be pirated.
theme_id
BIGINT PRIMARY KEY
theme_pricesMonthly price for an official theme.
theme_id
BIGINT PRIMARY KEY
price_usd_per_month
NUMERIC(10,2)
calendar_monthsA helper table with one row per month.
month_start
DATE PRIMARY KEY — e.g., 2024-01-01
For each calendar month, compute the percent of shops that have at least one active install of a pirated theme at any time during that month.
valid_from < next_month_start AND coalesce(valid_to, <as_of_timestamp>) >= month_start
COALESCE(valid_to, <as_of_timestamp>)
to handle NULL
valid_to
.
month_start
pirated_shops
(count of distinct
shop_id
with pirated theme active during the month)
total_shops
(count of distinct
shop_id
with any theme active during the month)
pirated_shop_pct
(
pirated_shops / total_shops
)
Assume each pirated theme would have generated revenue equal to its official theme’s monthly price in theme_prices while it is active.
Compute:
monthly_revenue_loss_usd
: for each month, sum
price_usd_per_month
across distinct
(shop_id, theme_id)
that are pirated and active during the month.
cumulative_revenue_loss_usd
: the rolling cumulative sum up through each month (i.e., “as of January”, “as of February”, …).
month_start
monthly_revenue_loss_usd
cumulative_revenue_loss_usd
Notes:
calendar_months
and a join to expand installs into months.
valid_to
using
COALESCE
.