PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates SQL and Pandas data manipulation competencies, including joins, time-based filtering, handling NULLs, aggregations, rate and ratio calculations, and windowed comparisons over grouped city-day data.

  • Medium
  • Uber
  • Data Manipulation (SQL/Python)
  • Data Scientist

Write SQL and Pandas for Uber Trips

Company: Uber

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Assume 'today' = 2025-09-01. You are given the following schema and small ASCII samples. Tables - riders(rider_id, name, signup_date) - drivers(driver_id, name, car_year) - trips(trip_id, rider_id, driver_id, request_time, pickup_time, dropoff_time, status, city, surge_multiplier, distance_km, fare_usd) • status ∈ {'completed','cancelled_by_rider','cancelled_by_driver','driver_no_show'} - payments(trip_id, amount_usd, method, card_hash, charge_time, success) - devices(user_type, user_id, device_id) where user_type ∈ {'rider','driver'} Samples (minimal, not exhaustive) riders rider_id | name | signup_date 1 | Alice | 2025-08-20 2 | Bob | 2025-08-28 3 | Chen | 2025-08-30 4 | Deepa | 2025-07-15 drivers driver_id | name | car_year 10 | Diego | 2018 11 | Elena | 2021 12 | Farid | 2015 trips trip_id | rider_id | driver_id | request_time | pickup_time | dropoff_time | status | city | surge_multiplier | distance_km | fare_usd 1 | 1 | 10 | 2025-08-25 19:55 | 2025-08-25 20:05 | 2025-08-25 20:25 | completed | SF | 1.8 | 8.0 | 24.5 2 | 2 | 11 | 2025-08-26 14:10 | 2025-08-26 14:15 | 2025-08-26 14:30 | completed | SF | 1.0 | 5.0 | 12.0 3 | 2 | 11 | 2025-08-26 20:05 | 2025-08-26 20:30 | NULL | cancelled_by_driver | SF | 2.0 | 0.0 | 0.0 4 | 3 | 12 | 2025-08-27 02:30 | NULL | NULL | cancelled_by_rider | NYC | 1.0 | 0.0 | 0.0 5 | 3 | 10 | 2025-08-30 20:15 | 2025-08-30 20:25 | 2025-08-30 20:45 | completed | NYC | 1.7 | 6.5 | 18.0 6 | 1 | 12 | 2025-08-31 14:05 | 2025-08-31 14:10 | 2025-08-31 14:30 | completed | SF | 1.0 | 7.0 | 16.0 7 | 4 | 10 | 2025-08-31 20:05 | 2025-08-31 20:20 | 2025-08-31 20:40 | completed | SF | 1.6 | 9.0 | 22.0 8 | 4 | 11 | 2025-09-01 02:10 | NULL | NULL | driver_no_show | SF | 1.0 | 0.0 | 0.0 payments trip_id | amount_usd | method | card_hash | charge_time | success 1 | 24.5 | card | abc123 | 2025-08-25 20:26 | true 2 | 12.0 | card | xyz777 | 2025-08-26 14:31 | true 3 | 0.0 | card | xyz777 | 2025-08-26 20:31 | false 4 | 0.0 | card | qwe555 | 2025-08-27 02:32 | false 5 | 18.0 | card | abc123 | 2025-08-30 20:46 | true 6 | 16.0 | card | abc123 | 2025-08-31 14:31 | true 7 | 22.0 | cash | NULL | 2025-08-31 20:41 | false 8 | 0.0 | card | xyz777 | 2025-09-01 02:12 | false devices user_type | user_id | device_id rider | 1 | devA rider | 2 | devB rider | 3 | devC rider | 4 | devB driver | 10 | devX driver | 11 | devY driver | 12 | devZ Write SQL answers for A–D and a Pandas answer for E: A) For the last 7 days (2025-08-26 to 2025-09-01 inclusive), compute per city: total requests, completed trips, and completion rate = completed / requests. Count a request if a row exists in trips (any status). Order by completion rate ascending. Handle NULL times robustly and ensure date filtering uses request_time in UTC. B) For each driver, over the last 7 days, compute the ratio: avg surge during 20:00–21:59 divided by avg surge during 14:00–15:59 on their completed trips, within the same city-day buckets. Return drivers where both windows have at least 1 completed trip and the ratio > 1.5. Include driver_id, city, counts per window, both averages, and the ratio. C) Over the last 7 days, compute the median pickup wait (pickup_time - request_time) per city for completed trips, after excluding trips above the city-specific 95th percentile wait. Use window functions to compute the percentile cutoff and the median on the truncated set. D) Identify likely duplicate rider accounts in the last 30 days: output rider_id_a, rider_id_b (a<b), evidence_type ('device' or 'card'), evidence_value (device_id or card_hash), first_seen_time. A pair qualifies if the riders share either the same device_id in devices or the same payments.card_hash used on trips by different rider_ids. Exclude NULL evidence values. For 'card', join payments→trips to map card_hash to rider_id. E) Pandas: Given a DataFrame trips_df of trips, compute 7-day new-user retention by cohort for riders whose first completed trip date is between 2025-08-25 and 2025-08-31. A rider is retained if they have ≥1 additional completed trip with dropoff_time within 7 days (inclusive) after their first completed trip. Return a DataFrame with cohort_date, n_new, n_retained, retention_rate, sorted by cohort_date.

Quick Answer: This question evaluates SQL and Pandas data manipulation competencies, including joins, time-based filtering, handling NULLs, aggregations, rate and ratio calculations, and windowed comparisons over grouped city-day data.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Transform DataFrame and compute diff-in-diff - Uber (easy)
  • Write SQL for active counts and YTD top driver - Uber (Medium)
  • Compute ETA shift and conversion uplift - Uber (Medium)
  • Write SQL/Python for CTR analytics - Uber (Medium)
  • Clean, split, merge, and aggregate with pandas - Uber (Medium)