This question evaluates a data scientist's competency in SQL-based data manipulation, time-series aggregation, joins, and metric computation for analyzing advertiser spend and shifts by creation_source.
You are working with ads data. Assume the following tables, with all timestamps interpreted in UTC.
advertisers(advertiser_id BIGINT, advertiser_category VARCHAR, created_at TIMESTAMP)
ads(ad_id BIGINT, advertiser_id BIGINT, creation_source VARCHAR, created_at TIMESTAMP)
ad_daily_stats(stat_date DATE, ad_id BIGINT, is_active BOOLEAN, spend_usd DECIMAL(18,2), impressions BIGINT, clicks BIGINT)
Relationships:
ads.advertiser_id
references
advertisers.advertiser_id
ad_daily_stats.ad_id
references
ads.ad_id
Definitions:
active ad
on a given date is an ad with
is_active = TRUE
in
ad_daily_stats
for that
stat_date
.
Last year
means
2023-01-01
through
2023-12-31
.
This year
means
2024-01-01
through
2024-12-31
.
pct_spend_in_2024
as
spend_2024 / (spend_2023 + spend_2024)
.
month
as the first day of the calendar month, and define
share_of_monthly_spend
as
total_spend_usd / total included spend in that month
.
Write SQL for the following tasks:
advertiser_id
spend_2023
spend_2024
pct_spend_in_2024
creation_source
may have been driven by a decline in another source. Build a monthly report for 2023-2024 by
creation_source
, while excluding selected advertiser categories such as
Political
,
Government
, and
House Ads
. Return:
month
creation_source
active_ads
distinct_advertisers
total_spend_usd
share_of_monthly_spend