Analyze sales with groupby
Company: Uber
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in data manipulation and analytics using pandas, specifically the ability to join relational tables and apply groupby aggregations to compute business metrics like total sales, average sales per order, and completion rates.
Constraints
- 0 <= len(orders), len(products), len(categories) <= 100000
- quantity is a non-negative integer and unit_price is a non-negative number
- Use inner-join semantics: unmatched orders or products are ignored
- Only categories with at least one matched order row should appear in the output
Examples
Input: ([{'order_id': 1, 'product_id': 101, 'quantity': 2, 'unit_price': 10.0, 'status': 'completed'}, {'order_id': 2, 'product_id': 102, 'quantity': 1, 'unit_price': 20.0, 'status': 'pending'}, {'order_id': 3, 'product_id': 103, 'quantity': 3, 'unit_price': 5.0, 'status': 'completed'}, {'order_id': 4, 'product_id': 101, 'quantity': 1, 'unit_price': 10.0, 'status': 'cancelled'}], [{'product_id': 101, 'category_id': 1}, {'product_id': 102, 'category_id': 2}, {'product_id': 103, 'category_id': 1}], [{'category_id': 1, 'category_name': 'Fast Food'}, {'category_id': 2, 'category_name': 'Drinks'}])
Expected Output: [{'category_name': 'Drinks', 'total_sales': 20.0, 'average_sales_per_order': 20.0, 'completion_rate': 0.0}, {'category_name': 'Fast Food', 'total_sales': 45.0, 'average_sales_per_order': 15.0, 'completion_rate': 0.6666666666666666}]
Explanation: Fast Food has order-line sales of 20, 15, and 10, so total_sales = 45 and average_sales_per_order = 15. Two of its three rows are completed, so completion_rate = 2/3. Drinks has one non-completed row with sales 20.
Input: ([], [{'product_id': 1, 'category_id': 10}], [{'category_id': 10, 'category_name': 'Pizza'}])
Expected Output: []
Explanation: There are no order rows to aggregate, so the result is empty.
Input: ([{'order_id': 10, 'product_id': 201, 'quantity': 1, 'unit_price': 3.5, 'status': 'completed'}, {'order_id': 11, 'product_id': 202, 'quantity': 2, 'unit_price': 4.0, 'status': 'completed'}, {'order_id': 12, 'product_id': 203, 'quantity': 1, 'unit_price': 10.0, 'status': 'failed'}], [{'product_id': 201, 'category_id': 1}, {'product_id': 202, 'category_id': 1}, {'product_id': 203, 'category_id': 2}], [{'category_id': 1, 'category_name': 'Desserts'}, {'category_id': 2, 'category_name': 'Sushi'}])
Expected Output: [{'category_name': 'Desserts', 'total_sales': 11.5, 'average_sales_per_order': 5.75, 'completion_rate': 1.0}, {'category_name': 'Sushi', 'total_sales': 10.0, 'average_sales_per_order': 10.0, 'completion_rate': 0.0}]
Explanation: Desserts contains two completed rows with sales 3.5 and 8.0, so total is 11.5 and the average is 5.75. Sushi has one failed row with sales 10.0, so its completion rate is 0.0.
Input: ([{'order_id': 1, 'product_id': 301, 'quantity': 1, 'unit_price': 12.0, 'status': 'completed'}, {'order_id': 1, 'product_id': 302, 'quantity': 2, 'unit_price': 3.0, 'status': 'completed'}, {'order_id': 2, 'product_id': 302, 'quantity': 1, 'unit_price': 3.0, 'status': 'pending'}], [{'product_id': 301, 'category_id': 1}, {'product_id': 302, 'category_id': 1}], [{'category_id': 1, 'category_name': 'Snacks'}, {'category_id': 2, 'category_name': 'Unused'}])
Expected Output: [{'category_name': 'Snacks', 'total_sales': 21.0, 'average_sales_per_order': 7.0, 'completion_rate': 0.6666666666666666}]
Explanation: All matched rows belong to Snacks. Even though order_id 1 appears twice, both lines are counted separately. Sales are 12.0, 6.0, and 3.0, so total is 21.0 and average is 7.0. Two of the three rows are completed.
Hints
- Build lookup maps such as product_id -> category_id and category_id -> category_name so you can simulate the joins efficiently.
- While scanning the orders, keep per-category aggregates: total sales, number of rows, and number of completed rows.