Compute CTR for peak vs non-peak hours
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
You work on a video ads platform. You are given three tables and asked to compare CTR during **peak hours** vs **non-peak hours**.
### Tables
Assume the following schemas:
**ads**
- `ad_id` STRING (PK)
- `advertiser_id` STRING
- `ad_type` STRING -- e.g., `direct`, `brand`
- `created_at` TIMESTAMP
**impressions**
- `impression_id` STRING (PK)
- `ad_id` STRING (FK → `ads.ad_id`)
- `user_id` STRING
- `impression_ts` TIMESTAMP
- `clicked` BOOLEAN -- whether the impression resulted in a click
**conversions** (not necessarily needed for CTR, but available)
- `conversion_id` STRING (PK)
- `ad_id` STRING (FK → `ads.ad_id`)
- `user_id` STRING
- `conversion_ts` TIMESTAMP
### Definitions / assumptions
- **CTR** = (# clicks) / (# impressions).
- Use `impressions.clicked` to count clicks.
- Evaluate CTR by **hour-of-day** (0–23) in a single timezone (state the timezone you use, e.g., UTC).
- **Peak hours**: hours whose total impressions are in the **top 20%** of hours by impression volume for the analysis period (if your SQL dialect needs a different approach, describe it).
### Task
Write a SQL query that outputs CTR for:
- peak hours
- non-peak hours
Required output columns:
- `hour_type` ENUM(`peak`,`non_peak`)
- `impressions`
- `clicks`
- `ctr`
(Optionally include the CTR difference `peak_ctr - non_peak_ctr`.)
Quick Answer: This question evaluates a candidate's ability to compute time-based click-through rate metrics using SQL and data manipulation techniques, including joins, aggregations, timezone-aware hour extraction, and percentile-based classification of peak hours.