Analyze Pirated Theme Usage
Company: Shopify
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: medium
Interview Round: Technical Screen
You work on the Shopify themes team. Some themes are known to be pirated copies of legitimate themes. Write SQL to analyze merchant usage of pirated themes and the revenue impact over time.
Assume all timestamps are stored in UTC, and all monthly reporting should use calendar months in UTC. Use the full date range available in `date_dim` unless otherwise specified.
### Tables
`pirated_theme`
- `theme_id BIGINT`: ID of a theme known to be pirated. One row per pirated theme.
`theme_dim`
- `theme_id BIGINT`: Unique theme ID.
- `created_at TIMESTAMP`: Timestamp when the theme was created.
- `monthly_revenue NUMERIC`: Monthly revenue in USD that Shopify or the partner would earn from one legitimate active install of this theme.
- `partner_id BIGINT`: ID of the theme partner or developer.
`shop_install_dim`
- `install_id BIGINT`: Unique installation ID.
- `theme_id BIGINT`: Theme installed by the shop.
- `shop_id BIGINT`: Shop that installed the theme.
- `start_from TIMESTAMP`: Timestamp when the install became active.
- `start_to TIMESTAMP NULL`: Timestamp when the install ended. `NULL` means the install is still active. Treat install intervals as `[start_from, start_to)`.
`date_dim`
- `date DATE`: Calendar date.
- `month DATE`: First day of the calendar month for `date`.
`shop_dim`
- The table may exist, but it is not required for this analysis.
### Definitions
An install is considered active in a month if its active interval overlaps the month:
- `start_from < first_day_of_next_month`, and
- `COALESCE(start_to, far_future_timestamp) > first_day_of_month`.
A pirated install is an active install whose `theme_id` appears in `pirated_theme`.
### Questions
1. Is merchants' usage of pirated themes increasing over time? Define an appropriate monthly usage metric and produce a month-level trend.
- Include at least the following output columns:
- `month`
- `active_pirated_installs`
- `active_pirated_shops`
- `active_total_installs`
- `pirated_install_share`
2. Estimate revenue lost over time due to shops installing pirated themes.
- For each active pirated install in a month, count `theme_dim.monthly_revenue` as estimated lost monthly revenue.
- Include at least the following output columns:
- `month`
- `active_pirated_installs`
- `estimated_lost_revenue_usd`
Quick Answer: This question evaluates competency in temporal data manipulation, interval-overlap logic, aggregations and joins to calculate monthly usage metrics and estimate revenue impact from pirated theme installations on an e-commerce theme platform.