Analyze spend cohort and source shifts
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Onsite
You work on an ads platform. Assume all timestamps are in UTC. Interpret **last year** as calendar year 2023 and **this year** as calendar year 2024.
Tables:
- `advertisers(advertiser_id BIGINT, advertiser_type VARCHAR, country VARCHAR, created_at TIMESTAMP)`
- `ads(ad_id BIGINT, advertiser_id BIGINT, creation_source VARCHAR)` where `creation_source` can take values such as `MANUAL`, `AI_ASSISTED`, `IMAGE_TEMPLATE`, and `VIDEO_TEMPLATE`
- `ad_spend_daily(spend_date DATE, ad_id BIGINT, spend_usd DECIMAL(18,2))`
Key relationships:
- One advertiser can have many ads.
- One ad can have many daily spend records.
Questions:
1. Identify advertisers whose **total spend in 2023 exceeded 1,000 USD**. Among all platform spend in 2024, what **percentage of 2024 spend** came from this advertiser cohort? Return the following columns: `eligible_advertisers`, `cohort_spend_2024`, `total_platform_spend_2024`, and `cohort_share_2024`.
2. PMs noticed that spend from one ad creation source is growing and want to know whether that growth may be driven by a decline in another source. Write a query that returns **monthly 2024 spend by `creation_source`**, while excluding advertisers whose `advertiser_type` is in `('INTERNAL', 'POLITICAL', 'TEST')`. Also include, for each month, the amount of spend from advertisers whose `AI_ASSISTED` spend increased versus the same month in 2023 while their `MANUAL` spend decreased, so the team can assess possible cannibalization between creation sources.
Quick Answer: This question evaluates data manipulation and analytical SQL/Python skills, focusing on cohort analysis, time-series aggregation, joins and filtering, and comparative spend analytics across calendar years.