Compute this-year spend share of last-year whales
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Onsite
## Context
You work on an ads analytics dataset. Assume all timestamps are in **UTC**, and “last year” / “this year” refer to the **previous/current calendar year** (e.g., based on `CURRENT_DATE`).
## Tables
### `ad_spend_daily`
Daily spend at the ad level.
- `spend_date` DATE
- `advertiser_id` BIGINT
- `ad_id` BIGINT
- `spend_usd` NUMERIC(18,2)
- `creation_source` STRING
*(e.g., `'manual'`, `'ai_assisted'`, `'api'`, `'bulk_upload'`)*
### `advertiser_dim`
Advertiser attributes.
- `advertiser_id` BIGINT **PK**
- `advertiser_category` STRING
*(used to exclude certain types; e.g., `'gaming'`, `'adult'`, `'political'`, `'internal_test'`)*
- `is_internal` BOOLEAN
- `status` STRING *(e.g., `'active'`, `'disabled'`)*
### `ad_dim`
Ad metadata.
- `ad_id` BIGINT **PK**
- `advertiser_id` BIGINT **FK → advertiser_dim.advertiser_id**
- `is_active` BOOLEAN
## Tasks
1) **High-spender cohort share**: Find advertisers whose **total spend last calendar year** was **> $1,000**. For those advertisers, compute what **percentage of total spend this calendar year** they account for.
Required output (1 row):
- `this_year_total_spend_usd`
- `cohort_this_year_spend_usd`
- `cohort_share_of_this_year` (a fraction or %)
2) **Excluding advertisers**: Modify your logic so the computed share **excludes** advertisers that match certain criteria (assume you are given a list), e.g.:
- `advertiser_category IN (...)` OR
- `is_internal = TRUE` OR
- `status != 'active'`
3) **Creation-source rollup (for substitution analysis)**: Write a query to output **monthly spend** by `creation_source` for the last **N months**, while applying the same advertiser exclusions.
Required output:
- `month`
- `creation_source`
- `spend_usd`
Quick Answer: This question evaluates proficiency in data manipulation and analytics engineering, specifically SQL and Python skills for aggregations, joins, calendar-year time filtering, cohort identification, exclusion logic, and monthly rollups of ad spend.