Determine Tutorial Completion Rate by Variant in 7 Days
Company: Confluent
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
users
| user_id | sign_up_at | tutorial_variant |
| 101 | 2023-08-01 | original |
| 102 | 2023-08-02 | variant |
| 103 | 2023-08-02 | original |
| 104 | 2023-08-03 | variant |
| 105 | 2023-08-03 | original |
tutorial_events
| event_id | user_id | step_index | event_time |
| 1 | 101 | 1 | 2023-08-01 10:00:00 |
| 2 | 101 | 2 | 2023-08-01 10:02:00 |
| 3 | 102 | 1 | 2023-08-02 11:00:00 |
| 4 | 104 | 1 | 2023-08-03 09:00:00 |
| 5 | 105 | 1 | 2023-08-03 12:00:00 |
##### Scenario
Confluent wants to understand the impact of two onboarding-tutorial variants (original vs variant) on tutorial completion.
##### Question
Given the tables below, write a SQL query that returns, for each tutorial_variant, the share of signed-up users who reach the maximum step_index (i.e., complete the tutorial) within 7 days of their sign_up_at date.
##### Hints
Join users with tutorial_events, find each user’s max step, compare to global max per variant, filter by 7-day window, then aggregate.
Quick Answer: This question evaluates a candidate's competency in data manipulation and event-based analytics, focusing on skills such as joining user and event tables, aggregating event data, and computing cohort completion metrics by variant.