Query email logs for deliverability insights
Company: Microsoft
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Using the schema and sample data below, write SQL to answer parts a–c. Assume timestamps are UTC. Today is 2025-09-01.
Schema:
- Emails(sent_id INT, message_id TEXT, provider TEXT, from_addr TEXT, to_domain TEXT, sent_ts TIMESTAMP, subject TEXT)
- Deliveries(message_id TEXT, status TEXT, event_ts TIMESTAMP, detail TEXT)
Sample data:
Emails
| sent_id | message_id | provider | from_addr | to_domain | sent_ts | subject |
| 1 | m1 | Gmail | candidate@gmail.com | ms.com | 2025-08-30 09:00 | avail slots |
| 2 | m2 | Gmail | candidate@gmail.com | ms.com | 2025-08-31 10:00 | follow-up |
| 3 | m3 | Outlook | candidate@outlook.com | ms.com | 2025-08-31 10:05 | follow-up alt |
| 4 | m4 | Gmail | candidate@gmail.com | ms.com | 2025-09-01 08:00 | confirmation |
| 5 | m5 | Outlook | candidate@outlook.com | ms.com | 2025-08-26 20:00 | initial reach |
| 6 | m6 | Gmail | candidate@gmail.com | otherco.com| 2025-08-27 09:00 | sanity check |
Deliveries
| message_id | status | event_ts | detail |
| m1 | bounced | 2025-08-30 09:01 | 550 5.7.1 |
| m1 | delivered | 2025-08-30 09:20 | |
| m3 | delivered | 2025-08-31 10:06 | |
| m4 | spam_folder | 2025-09-01 08:00 | filtered |
| m5 | delivered | 2025-08-26 20:01 | |
| m6 | delivered | 2025-08-27 09:01 | |
Note: message_id m2 has no rows in Deliveries (treat as undelivered).
Tasks:
(a) For emails sent in the last 7 days relative to 2025-09-01 (window 2025-08-26 through 2025-09-01 inclusive), compute by provider and to_domain the proportion delivered within 5 minutes of sent_ts. Rules: use the earliest event with status = 'delivered'; ignore 'spam_folder'; treat messages with no delivery rows as undelivered; if a message bounces then later delivers, count it as delivered only if the delivered event occurs within 5 minutes of sent_ts.
(b) Return all message_ids where the first observed event was a bounce and a later delivered event occurred more than 5 minutes after sent_ts; include the lag in minutes between sent_ts and the first delivered event.
(c) For domain ms.com over the same window, determine the earliest calendar date on which switching from Gmail to Outlook would have strictly improved deliverability, based on same-day cumulative delivered-within-5-minutes rates up to each send. Output that date and show the cumulative rates used to justify the switch.
Quick Answer: This question evaluates SQL-based data manipulation skills focused on event-time analytics, time-windowed aggregations, temporal joins, handling missing or multiple event rows, and computing deliverability metrics; it is categorized under Data Manipulation (SQL/Python) for a Data Scientist role.