Design Scalable Database and Analyze E-commerce Data
Company: Google
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
transactions
+-----------+----------+------------+------------+
| user_id | order_id | product_id | order_time |
+-----------+----------+------------+------------+
| 101 | 5001 | 23 | 2023-09-01 |
| 101 | 5001 | 45 | 2023-09-01 |
| 102 | 5002 | 23 | 2023-09-01 |
| 103 | 5003 | 67 | 2023-09-02 |
| 103 | 5003 | 45 | 2023-09-02 |
+-----------+----------+------------+------------+
##### Scenario
A video-streaming start-up needs a scalable database; later you are asked to solve quick data-ops questions for an e-commerce team.
##### Question
Design a relational database for the video company: what tables and key columns would you create and how are they related? Write an SQL query that returns the pair of products most frequently purchased together. Using Python/Pandas, add a new column to a DataFrame based on logical conditions from existing columns.
##### Hints
Think normalization, primary/foreign keys, group by order_id, value_counts or window functions, and vectorized Pandas operations.
Quick Answer: This question evaluates relational database design and data-manipulation competencies, including schema modeling and normalization, SQL-based association analysis for co-purchases, and vectorized data transformation using Python/Pandas.