A ride-hailing team runs an A/B test in San Francisco in July 2024 for a new routing algorithm intended to reduce time to pickup, abbreviated TTP.
You are given two pandas DataFrames:
df_users:
-
user_id
: unique user identifier
-
variant
: experiment assignment, either
control
or
treatment
df_rides:
-
ride_id
: unique ride identifier
-
user_id
: user identifier
-
ride_date
: ride timestamp or date
-
city
: ride city
-
time_to_pickup
: numeric TTP in minutes
Tasks:
-
Write Python code to filter to San Francisco rides in July 2024, join ride records to experiment assignments, run a Welch two-sample t-test comparing treatment versus control TTP, and return the p-value.
-
Given the returned p-value, how would you decide whether the result is statistically significant?
-
Is a statistically significant p-value conclusive proof that the new routing algorithm is better? If not, explain the main threats to validity.
-
Propose a stronger experiment design and analysis plan for this routing algorithm.