This question evaluates competency in temporal data manipulation, SQL joins and aggregations, distinct counting, and revenue-impact estimation within a Data Manipulation (SQL/Python) context for a data scientist role.
You are analyzing theme piracy on an e-commerce platform.
Assume the analysis window is 2023-01-01 through 2023-12-31, all timestamps are stored in UTC, and all monthly buckets should be computed by calendar month in UTC.
shop_id BIGINT
created_at TIMESTAMP
theme_id BIGINT
theme_name STRING
is_pirated BOOLEAN
pirated_theme_id BIGINT
legitimate_theme_id BIGINT
theme_id BIGINT
monthly_license_fee_usd NUMERIC(10,2)
install_id BIGINT
shop_id BIGINT
theme_id BIGINT
installed_at TIMESTAMP
valid_from DATE
valid_to DATE NULL
month_start DATE
theme_installations.shop_id -> shops.shop_id
theme_installations.theme_id -> themes.theme_id
pirated_theme_map.pirated_theme_id -> themes.theme_id
pirated_theme_map.legitimate_theme_id -> theme_pricing.theme_id
theme_installations
whose
theme_id
appears in
pirated_theme_map.pirated_theme_id
.
valid_to
is
NULL
, treat the installation as active through
2023-12-31
.
monthly_license_fee_usd
of the mapped legitimate theme for every month in which the pirated installation is active for at least one day.
shop_id
values that installed at least one pirated theme during that month
shop_id
values that installed any theme during that month
month_start
pirated_install_shops
all_install_shops
pirated_install_rate
shop_id
values with at least one pirated theme active at any point during that month
shop_id
values with at least one theme active at any point during that month
month_start
active_pirated_shops
active_theme_shops
active_pirated_rate
month_start
monthly_revenue_loss_usd
cumulative_revenue_loss_usd
Write SQL only.