Calculate January-2024 SF Promotion Impact Using SQL Queries
Company: Uber
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
campaign_users
+---------+-----------+
| user_id | treatment |
+---------+-----------+
| 1001 | control |
| 1002 | test |
| 1003 | control |
| 1004 | test |
| 1005 | control |
+---------+-----------+
sessions
+---------+-----------+--------+---------------------+
| user_id | session_id| region | timestamp |
+---------+-----------+--------+---------------------+
| 1001 | s1 | SF | 2024-01-05 10:00:00 |
| 1002 | s2 | NY | 2024-01-06 11:00:00 |
| 1002 | s3 | SF | 2024-01-10 12:00:00 |
| 1004 | s4 | SF | 2024-01-12 09:30:00 |
| 1005 | s5 | SF | 2024-02-01 08:20:00 |
+---------+-----------+--------+---------------------+
orders
+---------+-----------+----------+---------+--------+
| user_id | session_id| order_id | revenue | profit |
+---------+-----------+----------+---------+--------+
| 1001 | s1 | o101 | 50.00 | 10.00 |
| 1002 | s2 | o102 | 80.00 | 15.00 |
| 1002 | s3 | o103 | 45.00 | 12.00 |
| 1004 | s4 | o104 | 60.00 | 20.00 |
| 1004 | s4 | o105 | 20.00 | 5.00 |
+---------+-----------+----------+---------+--------+
##### Scenario
Using three relational tables that record campaign assignment, user sessions, and orders, evaluate the January-2024 SF-only promotion experiment and perform follow-up analyses.
##### Question
Write a SQL query that joins the three tables to compute average revenue per user for treatment vs. control, restricting data to sessions in January-2024 and region = 'SF'. For each treatment arm, return the three user_id values with the highest number of orders. For each treatment arm, calculate the percentage of users who have placed more than five orders.
##### Hints
Use proper JOINs, WHERE filters on date and region, GROUP BY treatment, COUNT orders, window functions or conditional aggregation as needed.
Quick Answer: This question evaluates a candidate's proficiency in relational data manipulation techniques—specifically joins, date and region filtering, aggregation, top‑N identification, and percentage calculations applied to experiment and user-level metrics.