Analyze spend and creation-source shifts
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Onsite
You are working with ads data. Assume the following tables, with all timestamps interpreted in UTC.
- `advertisers(advertiser_id BIGINT, advertiser_category VARCHAR, created_at TIMESTAMP)`
- `ads(ad_id BIGINT, advertiser_id BIGINT, creation_source VARCHAR, created_at TIMESTAMP)`
- `ad_daily_stats(stat_date DATE, ad_id BIGINT, is_active BOOLEAN, spend_usd DECIMAL(18,2), impressions BIGINT, clicks BIGINT)`
Relationships:
- `ads.advertiser_id` references `advertisers.advertiser_id`
- `ad_daily_stats.ad_id` references `ads.ad_id`
Definitions:
- An `active ad` on a given date is an ad with `is_active = TRUE` in `ad_daily_stats` for that `stat_date`.
- `Last year` means `2023-01-01` through `2023-12-31`.
- `This year` means `2024-01-01` through `2024-12-31`.
- For part 1, define `pct_spend_in_2024` as `spend_2024 / (spend_2023 + spend_2024)`.
- For part 2, report `month` as the first day of the calendar month, and define `share_of_monthly_spend` as `total_spend_usd / total included spend in that month`.
Write SQL for the following tasks:
1. Find all advertisers whose total spend in 2023 exceeded 1000 USD. For each qualifying advertiser, return:
- `advertiser_id`
- `spend_2023`
- `spend_2024`
- `pct_spend_in_2024`
2. Investigate whether growth in one `creation_source` may have been driven by a decline in another source. Build a monthly report for 2023-2024 by `creation_source`, while excluding selected advertiser categories such as `Political`, `Government`, and `House Ads`. Return:
- `month`
- `creation_source`
- `active_ads`
- `distinct_advertisers`
- `total_spend_usd`
- `share_of_monthly_spend`
Quick Answer: This question evaluates a data scientist's competency in SQL-based data manipulation, time-series aggregation, joins, and metric computation for analyzing advertiser spend and shifts by creation_source.