PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates proficiency in data manipulation and analytics, covering SQL windowing and deduplication for time‑windowed cohort metrics (48‑hour unique CTR) as well as practical Python model evaluation tasks such as precision‑recall plotting, weighted F1 threshold selection, and precision@top‑k.

  • Medium
  • Uber
  • Data Manipulation (SQL/Python)
  • Data Scientist

Write SQL/Python for CTR analytics

Company: Uber

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Part A — SQL (use the schema and sample data below): Compute 48-hour unique CTR for campaign_id=100 by variant, deduplicating to the earliest send per (user_id, campaign_id) on/after 2025-08-30, and excluding internal accounts user_id <= 2. Define CTR as distinct users with ≥1 click in [send_time, send_time + 48h] divided by distinct users sent (after deduplication). Return columns: campaign_id, variant, sends, unique_clickers_48h, ctr_48h. Then, also return a single row with the absolute lift (test_ctr − control_ctr). Optionally, include counts needed to compute a 95% CI for the lift using the normal approximation so it can be calculated downstream. Schema: users(user_id INT, signup_dt DATE, locale STRING) email_sends(send_id INT, user_id INT, campaign_id INT, send_time TIMESTAMP, variant STRING) -- variant in ('control','test') email_events(event_id INT, send_id INT, event_type STRING, event_time TIMESTAMP) -- event_type in ('open','click','unsubscribe') Sample data: users +---------+------------+--------+ | user_id | signup_dt | locale | +---------+------------+--------+ | 1 | 2025-08-20 | US | | 2 | 2025-08-22 | US | | 3 | 2025-08-25 | CA | | 4 | 2025-08-27 | US | | 5 | 2025-08-28 | GB | email_sends +---------+---------+------------+---------------------+----------+ | send_id | user_id | campaign_id| send_time | variant | +---------+---------+------------+---------------------+----------+ | 10 | 1 | 100 | 2025-08-30 09:00:00 | control | | 11 | 2 | 100 | 2025-08-30 09:00:00 | control | | 12 | 3 | 100 | 2025-08-30 09:00:00 | test | | 13 | 1 | 100 | 2025-08-31 09:00:00 | test | -- resend | 14 | 4 | 101 | 2025-08-31 10:00:00 | control | | 15 | 5 | 100 | 2025-08-30 09:00:00 | test | email_events +----------+---------+------------+---------------------+ | event_id | send_id | event_type | event_time | +----------+---------+------------+---------------------+ | 1000 | 10 | open | 2025-08-30 09:05:00 | | 1001 | 10 | click | 2025-08-30 09:06:00 | | 1002 | 11 | open | 2025-08-30 09:10:00 | | 1003 | 12 | open | 2025-08-30 09:07:00 | | 1004 | 12 | unsubscribe| 2025-08-30 10:00:00 | | 1005 | 13 | click | 2025-08-31 09:12:00 | | 1006 | 15 | click | 2025-08-31 09:00:00 | | 1007 | 14 | click | 2025-08-31 10:05:00 | Part B — Python (pandas/sklearn): Given a DataFrame df with columns y_true (0/1), y_prob (predicted probability), and weight (non-negative), write code to: (1) plot the precision-recall curve; (2) find the threshold that maximizes weighted F1 using 'weight' as sample weights; (3) compute precision@top1% of the population by scoring, breaking ties deterministically. Then, explain why a model can have ROC-AUC=0.86 but PR-AUC=0.18 at 1% prevalence, and which metric you would optimize for a marketing CTR use case.

Quick Answer: This question evaluates proficiency in data manipulation and analytics, covering SQL windowing and deduplication for time‑windowed cohort metrics (48‑hour unique CTR) as well as practical Python model evaluation tasks such as precision‑recall plotting, weighted F1 threshold selection, and precision@top‑k.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Transform DataFrame and compute diff-in-diff - Uber (easy)
  • Write SQL for active counts and YTD top driver - Uber (Medium)
  • Write SQL and Pandas for Uber Trips - Uber (Medium)
  • Compute ETA shift and conversion uplift - Uber (Medium)
  • Clean, split, merge, and aggregate with pandas - Uber (Medium)