Calculate survey response and quality metrics in SQL
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Compute survey response-rate and quality metrics from event data. Assume "today" = 2025-09-01, and compute over the last 7 days (2025-08-26 to 2025-09-01, inclusive).
Schema (invented):
- impressions(impression_id BIGINT, user_id BIGINT, survey_id INT, ts TIMESTAMP)
- clicks(click_id BIGINT, impression_id BIGINT, ts TIMESTAMP)
- surveys(survey_id INT, version VARCHAR, active_from DATE)
- survey_responses(response_id BIGINT, user_id BIGINT, survey_id INT, score INT, ts TIMESTAMP)
Sample data (minimal):
impressions
| impression_id | user_id | survey_id | ts |
| 1 | 101 | 10 | 2025-08-30 10:00:00 |
| 2 | 102 | 10 | 2025-08-30 10:05:00 |
| 3 | 101 | 11 | 2025-08-31 12:00:00 |
| 4 | 103 | 10 | 2025-09-01 09:00:00 |
| 5 | 104 | 10 | 2025-09-01 09:05:00 |
clicks
| click_id | impression_id | ts |
| 1 | 1 | 2025-08-30 10:01:00 |
| 2 | 1 | 2025-08-30 10:02:00 |
| 3 | 3 | 2025-08-31 12:05:00 |
| 4 | 5 | 2025-09-01 09:06:00 |
surveys
| survey_id | version | active_from |
| 10 | v1 | 2025-08-01 |
| 11 | v2 | 2025-08-25 |
survey_responses
| response_id | user_id | survey_id | score | ts |
| 1 | 101 | 10 | 4 | 2025-08-30 10:03:00 |
| 2 | 101 | 10 | 5 | 2025-08-30 10:10:00 |
| 3 | 101 | 11 | 3 | 2025-08-31 12:06:00 |
| 4 | 104 | 10 | 2 | 2025-09-01 09:07:00 |
Tasks (write precise SQL; assume ANSI SQL syntax):
1) Daily survey response rate by survey version: response_rate = unique_clicked_impressions / total_impressions per calendar date in the 7-day window. Count at most 1 click per impression. Join to surveys to report by version.
2) Overall response rate for the full 7-day window (single number) and the same metric broken out by survey version.
3) Quality metrics by survey version for the same window, using the provided score column in two ways:
a) First-score-only: for each user+survey_id, take their chronologically first score in the window; average across those user+survey pairs.
b) All-scores: average across all scores in the window (i.e., multiple scores from the same user count multiple times).
4) Ensure that scores are associated with impressions/clicks from the same window. If a response has no corresponding impression within the window, exclude it. If multiple impressions exist for the same user+survey_id, treat responses as eligible if they occur at or after the first click for that user+survey_id on that day.
5) Return: date, survey_version, total_impressions, unique_clicked_impressions, response_rate, avg_score_first_only, avg_score_all.
State any assumptions you make explicitly (e.g., timezone, handling of missing joins).
Quick Answer: This question evaluates SQL proficiency in computing time-windowed metrics, including deduplication, joins between event tables, aggregation, timestamp alignment, and data-quality handling for survey response and score calculations.