Find daily first-order merchants with SQL
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Given the table below, write a single SQL query using window functions to:
A) For each calendar date (UTC), return all merchant_id(s) whose order is the first completed order of that date across all merchants. Include date, order_id, merchant_id, order_ts. If multiple orders share the exact earliest timestamp on a date, return all ties.
B) Then (in a separate query), for each merchant and date, return that merchant's first completed order for that date.
Constraints: ignore rows where status <> 'completed'; avoid per-row correlated subqueries; be efficient on 100M+ rows.
Schema:
orders(order_id INT, merchant_id INT, user_id INT, order_ts TIMESTAMP, status VARCHAR)
Sample data:
order_id | merchant_id | user_id | order_ts (UTC) | status
1 | 10 | 100 | 2025-02-01 00:00:05 | completed
2 | 11 | 101 | 2025-02-01 00:00:05 | completed
3 | 10 | 102 | 2025-02-01 03:12:00 | completed
4 | 12 | 103 | 2025-02-02 00:00:01 | cancelled
5 | 11 | 104 | 2025-02-02 00:00:01 | completed
6 | 10 | 105 | 2025-02-02 00:00:01 | completed
7 | 12 | 106 | 2025-02-02 09:00:00 | completed
Follow-up: briefly justify your partitioning/sorting choices and any indexes you would add.
Quick Answer: This question evaluates proficiency with SQL window functions, set-based data manipulation, timestamp-based grouping and query optimization within the Data Manipulation (SQL/Python) domain for a Data Scientist role, emphasizing practical application rather than purely conceptual understanding.