Write SQL Data Transformation Queries
Company: Tesla
Role: Data Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in SQL data transformation and aggregation, testing competencies such as monthly-reset running totals, hierarchical recursion for reporting structures, and single-pass conditional aggregations.
Part 1: Monthly-Reset Cumulative Sales
Constraints
- 0 <= len(sales) <= 200000
- `date` is always a valid ISO date string in `YYYY-MM-DD` format
- 1 <= store_id <= 10^9
- 0 <= amount <= 10^9
- Input may contain multiple rows with the same `(store_id, date)`
Examples
Input: [('2023-01-01', 1, 100), ('2023-01-01', 1, 50), ('2023-01-02', 1, 25), ('2023-02-01', 1, 10), ('2023-01-01', 2, 80), ('2023-01-03', 2, 20), ('2023-02-02', 2, 5)]
Expected Output: [(1, '2023-01-01', 150), (1, '2023-01-02', 175), (1, '2023-02-01', 10), (2, '2023-01-01', 80), (2, '2023-01-03', 100), (2, '2023-02-02', 5)]
Explanation: Duplicate rows for store 1 on 2023-01-01 are merged first. Running totals reset in February for each store.
Input: [('2023-03-05', 1, 40), ('2023-02-28', 1, 60), ('2023-03-01', 1, 10)]
Expected Output: [(1, '2023-02-28', 60), (1, '2023-03-01', 10), (1, '2023-03-05', 50)]
Explanation: Input is unsorted. The March running total starts over from 0.
Input: []
Expected Output: []
Explanation: Edge case: no sales records.
Input: [('2024-12-31', 3, 7)]
Expected Output: [(3, '2024-12-31', 7)]
Explanation: Edge case: a single aggregated row remains unchanged.
Hints
- Aggregate duplicate `(store_id, date)` rows before computing any running total.
- Because dates are in `YYYY-MM-DD` format, lexicographic sorting matches chronological order.
Part 2: Full Reporting Hierarchy
Constraints
- 0 <= len(organization) <= 200000
- Employee IDs are unique
- Each employee has at most one manager
- The management structure contains no cycles
- If employee `1` does not exist, return an empty list
Examples
Input: [(1, None, 'CEO'), (2, 1, 'Alice'), (3, 1, 'Bob'), (4, 2, 'Cara'), (5, 2, 'Dan'), (6, 4, 'Eve'), (7, 3, 'Finn')]
Expected Output: [(2, 1, 'Alice', 1, '1->2'), (3, 1, 'Bob', 1, '1->3'), (4, 2, 'Cara', 2, '1->2->4'), (5, 2, 'Dan', 2, '1->2->5'), (7, 3, 'Finn', 2, '1->3->7'), (6, 4, 'Eve', 3, '1->2->4->6')]
Explanation: Employees 2 and 3 report directly to 1. The rest are found by traversing deeper levels.
Input: [(2, 1, 'Alice'), (3, 2, 'Bob')]
Expected Output: []
Explanation: Edge case: employee 1 is missing, so no hierarchy rooted at 1 can be formed.
Input: [(1, None, 'CEO')]
Expected Output: []
Explanation: Edge case: employee 1 exists but has no reports.
Input: [(4, 2, 'Dev'), (2, 1, 'VP'), (1, None, 'CEO'), (3, None, 'OtherRoot'), (5, 3, 'X')]
Expected Output: [(2, 1, 'VP', 1, '1->2'), (4, 2, 'Dev', 2, '1->2->4')]
Explanation: Only employees connected under manager 1 are included; the separate tree rooted at employee 3 is ignored.
Hints
- Build a graph from each manager to their direct reports.
- A BFS or DFS can carry both the current level and the path built so far.
Part 3: Conditional Order Metrics by Date
Constraints
- 0 <= len(orders) <= 200000
- `status` is always one of `pending`, `completed`, or `cancelled`
- 0 <= amount <= 10^9
- `date` is always a valid ISO date string in `YYYY-MM-DD` format
Examples
Input: [(101, 'completed', 100, '2023-01-01'), (102, 'pending', 50, '2023-01-01'), (103, 'cancelled', 70, '2023-01-01'), (104, 'completed', 30, '2023-01-02'), (105, 'pending', 20, '2023-01-02'), (106, 'completed', 10, '2023-01-01')]
Expected Output: [('2023-01-01', 110, 1, 1), ('2023-01-02', 30, 1, 0)]
Explanation: Completed amounts are summed into revenue; pending and cancelled orders are counted separately.
Input: [(1, 'pending', 5, '2023-03-02'), (2, 'cancelled', 7, '2023-03-01'), (3, 'pending', 4, '2023-03-01')]
Expected Output: [('2023-03-01', 0, 1, 1), ('2023-03-02', 0, 1, 0)]
Explanation: Dates with no completed orders still appear with revenue 0.
Input: []
Expected Output: []
Explanation: Edge case: no orders.
Input: [(999, 'completed', 999, '2024-12-31')]
Expected Output: [('2024-12-31', 999, 0, 0)]
Explanation: Edge case: one completed order on a single day.
Hints
- For each date, keep three running metrics in the same dictionary entry.
- You do not need three separate passes over the input.