Top 5 Most Efficient Vehicle Models
Company: Waymo
Role: Data Analyst
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
## Top 5 Most Efficient Vehicle Models
You work with an autonomous-vehicle fleet's operational database. Each physical vehicle belongs to a model (e.g., a specific make/model/year configuration). The fleet logs every driving trip (miles driven and energy consumed in kWh) and every maintenance event (with its cost).
Write a SQL query that returns the **top 5 most efficient vehicle models**, where efficiency is defined as:
$$\text{efficiency} = \frac{\text{total miles driven across all vehicles of the model}}{\text{total kWh consumed across all vehicles of the model}}$$
(i.e., **miles per kWh**, higher is better).
A model is only eligible if the **total maintenance cost across all vehicles of that model is less than 5000**. Crucially, a vehicle (and therefore a model) that has **no maintenance records at all** has a total maintenance cost of 0, so it is eligible and **must not be dropped** from consideration.
The query must be **production-ready**: it must not crash or produce a divide-by-zero error when a model's total energy consumption is `0` or `NULL`. Any model whose total energy is `0` or `NULL` cannot have a defined efficiency and should be **excluded** from the ranked output (it is not "infinitely efficient").
### Schema
**`vehicle_model`** — one row per distinct model.
| column | type | notes |
|------------|---------------|--------------------------------|
| model_id | INTEGER | Primary key. |
| model_name | VARCHAR | Human-readable model name. |
**`vehicle`** — one row per physical vehicle. Many vehicles per model.
| column | type | notes |
|------------|----------|--------------------------------------------------------|
| vehicle_id | INTEGER | Primary key. |
| model_id | INTEGER | Foreign key → `vehicle_model.model_id`. |
**`trip`** — one row per completed driving trip.
| column | type | notes |
|-----------------|---------------|--------------------------------------------------|
| trip_id | INTEGER | Primary key. |
| vehicle_id | INTEGER | Foreign key → `vehicle.vehicle_id`. |
| miles_driven | NUMERIC | Miles driven on this trip ($\ge 0$). |
| energy_kwh | NUMERIC | Energy consumed on this trip in kWh; may be `0` or `NULL`. |
**`maintenance`** — one row per maintenance event. A vehicle may have zero, one, or many maintenance events.
| column | type | notes |
|-----------------|---------------|--------------------------------------------------|
| maintenance_id | INTEGER | Primary key. |
| vehicle_id | INTEGER | Foreign key → `vehicle.vehicle_id`. |
| cost | NUMERIC | Cost of this maintenance event. |
### Relationships
- `vehicle_model` 1 — N `vehicle` (a model has many vehicles).
- `vehicle` 1 — N `trip` (a vehicle has many trips).
- `vehicle` 1 — N `maintenance` (a vehicle has zero or more maintenance events).
### Required Output
Return exactly these columns, ordered by efficiency descending, limited to 5 rows:
| column | description |
|------------|----------------------------------------------------------|
| model_id | The model's id. |
| model_name | The model's name. |
| efficiency | Total miles ÷ total kWh for all vehicles of the model (miles per kWh). |
### Notes & Assumptions
- Aggregate miles, energy, and maintenance cost **at the model level** (sum across every vehicle of the model and every trip/maintenance event of those vehicles).
- A model with no trips at all has no defined efficiency (total energy is `NULL`/`0`) and is excluded.
- Maintenance and trips must be aggregated independently so that joining one does not fan out / double-count the other.
- Ties at the 5th position may be broken arbitrarily.
Quick Answer: This SQL question tests practical data manipulation skills including multi-table aggregation, NULL handling, and conditional filtering across a relational schema. It assesses the ability to correctly join independent aggregations to avoid row fan-out and apply production-ready safeguards like divide-by-zero protection, a common requirement in data analyst interviews.