Assume 'today' = 2025-09-01. You are given the following schema and small ASCII samples.
Tables
Samples (minimal, not exhaustive) riders rider_id | name | signup_date 1 | Alice | 2025-08-20 2 | Bob | 2025-08-28 3 | Chen | 2025-08-30 4 | Deepa | 2025-07-15
drivers driver_id | name | car_year 10 | Diego | 2018 11 | Elena | 2021 12 | Farid | 2015
trips trip_id | rider_id | driver_id | request_time | pickup_time | dropoff_time | status | city | surge_multiplier | distance_km | fare_usd 1 | 1 | 10 | 2025-08-25 19:55 | 2025-08-25 20:05 | 2025-08-25 20:25 | completed | SF | 1.8 | 8.0 | 24.5 2 | 2 | 11 | 2025-08-26 14:10 | 2025-08-26 14:15 | 2025-08-26 14:30 | completed | SF | 1.0 | 5.0 | 12.0 3 | 2 | 11 | 2025-08-26 20:05 | 2025-08-26 20:30 | NULL | cancelled_by_driver | SF | 2.0 | 0.0 | 0.0 4 | 3 | 12 | 2025-08-27 02:30 | NULL | NULL | cancelled_by_rider | NYC | 1.0 | 0.0 | 0.0 5 | 3 | 10 | 2025-08-30 20:15 | 2025-08-30 20:25 | 2025-08-30 20:45 | completed | NYC | 1.7 | 6.5 | 18.0 6 | 1 | 12 | 2025-08-31 14:05 | 2025-08-31 14:10 | 2025-08-31 14:30 | completed | SF | 1.0 | 7.0 | 16.0 7 | 4 | 10 | 2025-08-31 20:05 | 2025-08-31 20:20 | 2025-08-31 20:40 | completed | SF | 1.6 | 9.0 | 22.0 8 | 4 | 11 | 2025-09-01 02:10 | NULL | NULL | driver_no_show | SF | 1.0 | 0.0 | 0.0
payments trip_id | amount_usd | method | card_hash | charge_time | success 1 | 24.5 | card | abc123 | 2025-08-25 20:26 | true 2 | 12.0 | card | xyz777 | 2025-08-26 14:31 | true 3 | 0.0 | card | xyz777 | 2025-08-26 20:31 | false 4 | 0.0 | card | qwe555 | 2025-08-27 02:32 | false 5 | 18.0 | card | abc123 | 2025-08-30 20:46 | true 6 | 16.0 | card | abc123 | 2025-08-31 14:31 | true 7 | 22.0 | cash | NULL | 2025-08-31 20:41 | false 8 | 0.0 | card | xyz777 | 2025-09-01 02:12 | false
devices user_type | user_id | device_id rider | 1 | devA rider | 2 | devB rider | 3 | devC rider | 4 | devB driver | 10 | devX driver | 11 | devY driver | 12 | devZ
Write SQL answers for A–D and a Pandas answer for E: A) For the last 7 days (2025-08-26 to 2025-09-01 inclusive), compute per city: total requests, completed trips, and completion rate = completed / requests. Count a request if a row exists in trips (any status). Order by completion rate ascending. Handle NULL times robustly and ensure date filtering uses request_time in UTC. B) For each driver, over the last 7 days, compute the ratio: avg surge during 20:00–21:59 divided by avg surge during 14:00–15:59 on their completed trips, within the same city-day buckets. Return drivers where both windows have at least 1 completed trip and the ratio > 1.5. Include driver_id, city, counts per window, both averages, and the ratio. C) Over the last 7 days, compute the median pickup wait (pickup_time - request_time) per city for completed trips, after excluding trips above the city-specific 95th percentile wait. Use window functions to compute the percentile cutoff and the median on the truncated set. D) Identify likely duplicate rider accounts in the last 30 days: output rider_id_a, rider_id_b (a<b), evidence_type ('device' or 'card'), evidence_value (device_id or card_hash), first_seen_time. A pair qualifies if the riders share either the same device_id in devices or the same payments.card_hash used on trips by different rider_ids. Exclude NULL evidence values. For 'card', join payments→trips to map card_hash to rider_id. E) Pandas: Given a DataFrame trips_df of trips, compute 7-day new-user retention by cohort for riders whose first completed trip date is between 2025-08-25 and 2025-08-31. A rider is retained if they have ≥1 additional completed trip with dropoff_time within 7 days (inclusive) after their first completed trip. Return a DataFrame with cohort_date, n_new, n_retained, retention_rate, sorted by cohort_date.