Evaluates SQL data-manipulation skills—specifically joins, time-based filtering, handling NULL end times, status-based predicates, aggregation, and distinct counts—within the Data Manipulation (SQL/Python) domain for a Data Scientist role at a practical implementation level.
You work on an ads platform and need to report Active Ad Revenue by the ad’s creation source.
Assume all timestamps are in UTC.
ads
ad_id
BIGINT (PK)
advertiser_id
BIGINT
creation_source
VARCHAR -- e.g., 'UI', 'API', 'BULK_UPLOAD', 'PARTNER'
status
VARCHAR -- e.g., 'ACTIVE', 'PAUSED', 'DELETED'
start_time
TIMESTAMP -- when the ad is scheduled to start serving
end_time
TIMESTAMP -- when the ad stops serving (nullable = no scheduled end)
ad_impressions
impression_id
BIGINT (PK)
ad_id
BIGINT (FK -> ads.ad_id)
user_id
BIGINT
impression_ts
TIMESTAMP
revenue_usd
DECIMAL(12,4) -- revenue attributed to this impression
An impression counts toward Active Ad Revenue if, at impression_ts:
status = 'ACTIVE'
, and
impression_ts >= start_time
, and
end_time
IS NULL OR
impression_ts < end_time
).
Write a SQL query to compute, for the date range 2024-01-01 to 2024-01-31 (inclusive by date):
creation_source
active_ad_revenue_usd
= SUM of
revenue_usd
for active impressions in the range
active_ads
= number of distinct
ad_id
that had
at least one active impression
in the range
Order results by active_ad_revenue_usd descending.