Compute ad impression conversion rates
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Onsite
You are given two tables for an ads product.
Table: `ad_impressions`
| Column | Type | Description |
|---|---:|---|
| `impression_id` | STRING | Unique identifier for an ad impression. Primary key. |
| `user_id` | STRING | User who saw the ad. |
| `ad_id` | STRING | Ad that was shown. |
| `campaign_id` | STRING | Campaign that owns the ad. |
| `impression_ts` | TIMESTAMP | Time the impression was served, stored in UTC. |
| `cost_usd` | NUMERIC | Advertiser cost for the impression in USD. |
Table: `ad_conversions`
| Column | Type | Description |
|---|---:|---|
| `conversion_id` | STRING | Unique identifier for a conversion event. Primary key. |
| `user_id` | STRING | User who converted. |
| `ad_id` | STRING | Ad associated with the conversion event. |
| `conversion_ts` | TIMESTAMP | Time the conversion occurred, stored in UTC. |
| `revenue_usd` | NUMERIC | Conversion value in USD. |
Attribution rule:
- A conversion can be attributed only to an impression from the same `user_id` and `ad_id`.
- The impression must occur before the conversion.
- The conversion must occur within 7 days after the impression.
- If multiple impressions are eligible for the same conversion, attribute the conversion to the most recent eligible impression before the conversion.
- Each conversion should be attributed at most once.
Write a SQL query for impressions served from `2024-01-01 00:00:00 UTC` through `2024-01-31 23:59:59 UTC`. For each `campaign_id` and UTC `impression_date`, return:
- `campaign_id`
- `impression_date`
- `impressions`: number of impressions
- `unique_impression_users`: number of distinct users with impressions
- `attributed_conversions`: number of conversions attributed to those impressions
- `conversion_rate`: attributed conversions divided by impressions
- `total_spend_usd`: total impression cost
- `total_conversion_revenue_usd`: total attributed conversion revenue
- `avg_hours_to_conversion`: average hours between impression and attributed conversion, across attributed conversions only
Rows with impressions but zero attributed conversions should still appear with zero conversions and zero conversion revenue.
Quick Answer: This question evaluates proficiency in time-based event attribution, aggregation of business metrics (impressions, conversion rates, spend and revenue), timestamp handling, and user/ad-level matching within the Data Manipulation (SQL/Python) domain, emphasizing practical application over abstract theory.