Write SQL/Python for CTR analytics

Q: Write SQL/Python for CTR analytics

This question evaluates proficiency in data manipulation and analytics, covering SQL windowing and deduplication for time‑windowed cohort metrics (48‑hour unique CTR) as well as practical Python model evaluation tasks such as precision‑recall plotting, weighted F1 threshold selection, and precision@top‑k.

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Uber.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Uber during technical interviews.

Question

Part A — SQL (use the schema and sample data below): Compute 48-hour unique CTR for campaign_id=100 by variant, deduplicating to the earliest send per (user_id, campaign_id) on/after 2025-08-30, and excluding internal accounts user_id <= 2. Define CTR as distinct users with ≥1 click in [send_time, send_time + 48h] divided by distinct users sent (after deduplication). Return columns: campaign_id, variant, sends, unique_clickers_48h, ctr_48h. Then, also return a single row with the absolute lift (test_ctr − control_ctr). Optionally, include counts needed to compute a 95% CI for the lift using the normal approximation so it can be calculated downstream.

Schema:
users(user_id INT, signup_dt DATE, locale STRING)
email_sends(send_id INT, user_id INT, campaign_id INT, send_time TIMESTAMP, variant STRING) -- variant in ('control','test')
email_events(event_id INT, send_id INT, event_type STRING, event_time TIMESTAMP) -- event_type in ('open','click','unsubscribe')

Sample data:
users
+---------+------------+--------+
| user_id | signup_dt | locale |
+---------+------------+--------+
| 1 | 2025-08-20 | US |
| 2 | 2025-08-22 | US |
| 3 | 2025-08-25 | CA |
| 4 | 2025-08-27 | US |
| 5 | 2025-08-28 | GB |

email_sends
+---------+---------+------------+---------------------+----------+
| send_id | user_id | campaign_id| send_time | variant |
+---------+---------+------------+---------------------+----------+
| 10 | 1 | 100 | 2025-08-30 09:00:00 | control |
| 11 | 2 | 100 | 2025-08-30 09:00:00 | control |
| 12 | 3 | 100 | 2025-08-30 09:00:00 | test |
| 13 | 1 | 100 | 2025-08-31 09:00:00 | test | -- resend
| 14 | 4 | 101 | 2025-08-31 10:00:00 | control |
| 15 | 5 | 100 | 2025-08-30 09:00:00 | test |

email_events
+----------+---------+------------+---------------------+
| event_id | send_id | event_type | event_time |
+----------+---------+------------+---------------------+
| 1000 | 10 | open | 2025-08-30 09:05:00 |
| 1001 | 10 | click | 2025-08-30 09:06:00 |
| 1002 | 11 | open | 2025-08-30 09:10:00 |
| 1003 | 12 | open | 2025-08-30 09:07:00 |
| 1004 | 12 | unsubscribe| 2025-08-30 10:00:00 |
| 1005 | 13 | click | 2025-08-31 09:12:00 |
| 1006 | 15 | click | 2025-08-31 09:00:00 |
| 1007 | 14 | click | 2025-08-31 10:05:00 |

Part B — Python (pandas/sklearn): Given a DataFrame df with columns y_true (0/1), y_prob (predicted probability), and weight (non-negative), write code to: (1) plot the precision-recall curve; (2) find the threshold that maximizes weighted F1 using 'weight' as sample weights; (3) compute precision@top1% of the population by scoring, breaking ties deterministically. Then, explain why a model can have ROC-AUC=0.86 but PR-AUC=0.18 at 1% prevalence, and which metric you would optimize for a marketing CTR use case.

PracHub · Accepted Answer

This question evaluates proficiency in data manipulation and analytics, covering SQL windowing and deduplication for time‑windowed cohort metrics (48‑hour unique CTR) as well as practical Python model evaluation tasks such as precision‑recall plotting, weighted F1 threshold selection, and precision@top‑k.

Quick Overview

Quick Overview