Determine Growth of Pirated Theme Installations Over Years
Company: Shopify
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
merchants
+-------------+------------+
| merchant_id | join_date |
+-------------+------------+
| 1 | 2020-01-02 |
| 2 | 2020-03-11 |
| 3 | 2021-05-20 |
| 4 | 2021-07-15 |
| 5 | 2022-02-22 |
+-------------+------------+
themes
+----------+------------+
| theme_id | is_pirated |
+----------+------------+
| 101 | TRUE |
| 102 | FALSE |
| 103 | TRUE |
| 104 | FALSE |
| 105 | TRUE |
+----------+------------+
merchant_themes
+-------------+----------+------------+
| merchant_id | theme_id | install_dt |
+-------------+----------+------------+
| 1 | 101 | 2020-01-02 |
| 1 | 102 | 2020-06-01 |
| 2 | 103 | 2021-02-10 |
| 3 | 104 | 2021-07-01 |
| 4 | 105 | 2022-03-11 |
+-------------+----------+------------+
##### Scenario
Shopify wants to understand if the proportion of merchants installing pirated storefront themes is growing over time.
##### Question
Write a SQL query that returns, for each calendar year, the percentage of active merchants who installed at least one pirated theme. From this result, determine whether adoption of pirated themes is increasing year-over-year.
##### Hints
Join theme installations to theme attributes, aggregate distinct merchant_ids by year, then compare year-over-year percentages.
Quick Answer: This question evaluates proficiency in SQL-based data manipulation and analytics, focusing on computing time-based cohort percentages and conducting year-over-year trend analysis of merchant behavior.