Use today = 2025-09-01. Consider NYC taxi trip data over the last 7 days inclusive (2025-08-26 to 2025-09-01, America/New_York). You receive two datasets and must write a fast, vectorized analysis (no Python for-loops over rows). Data schema and tiny samples:
trips(id, taxi_id, pickup_ts, dropoff_ts, pickup_zone_id, dropoff_zone_id, distance_miles, fare_amount)
1 | 101 | 2025-08-26 08:15 | 2025-08-26 08:45 | 1 | 3 | 6.0 | 18.50
2 | 102 | 2025-08-26 00:20 | 2025-08-26 00:50 | 2 | 1 | 4.0 | 14.00
3 | 101 | 2025-08-28 01:10 | 2025-08-28 01:40 | 1 | 1 | 3.0 | 12.00
4 | 103 | 2025-08-30 17:05 | 2025-08-30 17:25 | 4 | 2 | 2.5 | 9.50
5 | 104 | 2025-09-01 02:30 | 2025-09-01 03:20 | 1 | 4 | 10.0 | 30.00
6 | 102 | 2025-08-31 23:50 | 2025-09-01 00:10 | 3 | 3 | 5.0 | 16.00
zones(zone_id, borough)
1 | Manhattan
2 | Brooklyn
3 | Queens
4 | Bronx
Tasks:
-
After joining trips with zones on pickup_zone_id, compute per (borough, hour_of_day from pickup_ts) the median trip speed in mph, where speed = distance_miles / duration_hours. Filter trips to 1 ≤ duration_minutes ≤ 120 and 1 ≤ speed ≤ 80. Return the top 3 (borough, hour) pairs by median speed; break ties by borough asc, then hour asc. Report the exact (borough, hour, median_speed_mph) triples. 2) For trips with pickup borough = 'Manhattan' and pickup time between 00:00 and 05:00 inclusive, identify the 3 taxi_id with the largest 95th percentile of trip duration (minutes) over the same date range; break ties by taxi_id asc. Clearly define how you compute the 95th percentile (e.g., pandas/numpy method) and use a stable, vectorized approach. 3) Provide pandas code (or SQL) that runs in O(n log n) or better due to grouping/quantile operations, avoids per-row loops, and uses: parsed datetime dtypes; one-to-many join performed once; categorical dtype for borough; appropriate indexing on pickup_ts for time filtering. 4) Briefly justify two memory/performance optimizations you employ (e.g., downcasting floats/ints, using groupby-agg with quantile in a single pass, avoiding intermediate copies).