Compute Ride Metrics in SQL
Company: Waymo
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Take-home Project
You are given two tables for a ride-hailing product.
Tables:
- `users`: one row per registered user.
- `user_id` BIGINT PRIMARY KEY
- `city` VARCHAR NOT NULL
- `rides`: one row per completed ride.
- `ride_id` BIGINT PRIMARY KEY
- `ride_date` TIMESTAMP NOT NULL
- `ride_rating` DECIMAL, nullable
- `user_id` BIGINT NOT NULL, foreign key to `users.user_id`
Assumptions:
- All users in `users` are eligible users, including users who signed up but never took a ride.
- All date filters and calendar-month groupings use the `America/Los_Angeles` timezone. If `ride_date` is already a DATE rather than a TIMESTAMP, treat it as already converted to that timezone.
- A ride is counted by counting rows in `rides`.
- SQL `AVG` should ignore NULL ratings. If a group has no non-null ratings, the average rating may be NULL.
Write SQL queries for the following tasks:
1. **Inactive User Analysis**
Count users with zero rides in the 7-day window ending on July 7, 2024, inclusive. The window is July 1, 2024 through July 7, 2024. Users with no rows in `rides` must be counted as inactive.
- Required output column: `inactive_user_count`
2. **Monthly Ride Aggregation**
For each calendar month with at least one ride, compute total rides and average ride rating, rounded to 2 decimal places. Sort by month descending.
- Required output columns: `month_start`, `total_rides`, `avg_ride_rating`
3. **Low-Frequency Users by City**
Count unique users who have taken 0 or 1 total lifetime rides, grouped by city. Users who never took a ride must be included.
- Required output columns: `city`, `low_frequency_user_count`
4. **Retention Rating by Active Month**
For each user, define the user's active months as the distinct calendar months in which the user had at least one ride, ordered chronologically. Inactive calendar months are skipped. For example, if a user rode in January, March, and June, then January is active month 1, March is active month 2, and June is active month 3.
Compute the average ride rating for rides that occurred in users' 1st active month and 3rd active month, grouped by city. The average must be computed across all ride rows, not as an average of per-user averages.
- Required output columns: `city`, `active_month_number`, `avg_ride_rating`
- Include only `active_month_number` values 1 and 3.
Quick Answer: This question evaluates SQL data manipulation skills, covering joins between normalized tables, timezone-aware date filtering and calendar-month grouping, aggregation and NULL-aware averages, counting distinct and inactive users, and cohort/retention analysis.