PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Shopify

Compute Theme Similarity

Last updated: Jun 9, 2026

Quick Overview

This question evaluates understanding and implementation of set-based similarity metrics, specifically the Jaccard coefficient, along with skills in feature comparison, deterministic matching, and robust data handling within the Coding & Algorithms and Data Science domains; the level of abstraction is practical application, requiring concrete function implementation rather than only conceptual reasoning. Such problems are commonly asked to assess a candidate's ability to compute accurate similarity scores, manage edge cases and empty inputs, and apply threshold-based matching for tasks like identifying likely pirated custom themes.

  • medium
  • Shopify
  • Coding & Algorithms
  • Data Scientist

Compute Theme Similarity

Company: Shopify

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Implement Python functions to compare theme similarity using the Jaccard coefficient. ### Part 1: Basic Jaccard similarity Given two lists of strings, implement a function that returns their Jaccard similarity: `Jaccard(A, B) = size(intersection(A, B)) / size(union(A, B))` Requirements: - Return a float in `[0, 1]`. - Treat the input lists as sets, so duplicate strings should not affect the score. - If both lists are empty, return `1.0`. - If only one list is empty, return `0.0`. Example: ```python list_1 = ["a", "b", "c", "d"] list_2 = ["b", "c", "d", "e"] ``` The expected score is `3 / 5 = 0.6`. ### Part 2: Identify likely pirated custom themes You are given two lists of dictionaries: ```python pirated_themes = [ {"theme_id": "p1", "features": ["a", "b", "c"]}, {"theme_id": "p2", "features": ["x", "y", "z"]} ] custom_themes = [ {"theme_id": "c1", "features": ["a", "b", "d"]}, {"theme_id": "c2", "features": ["x", "y", "z"]} ] ``` Each theme dictionary contains: - `theme_id`: a unique theme identifier. - `features`: a list of strings representing extracted theme attributes, assets, file hashes, CSS classes, or other comparable signals. Implement a function: ```python def find_likely_pirated_themes(pirated_themes, custom_themes, threshold=0.8): ... ``` For each custom theme: 1. Compare it with every known pirated theme using Jaccard similarity over `features`. 2. Find the best matching pirated theme. 3. Return custom themes whose best similarity score is greater than or equal to `threshold`. The returned result should include, for each flagged custom theme: - `custom_theme_id` - `matched_pirated_theme_id` - `similarity_score` Handle missing or empty `features` lists gracefully, do not mutate the input objects, and make the output deterministic.

Quick Answer: This question evaluates understanding and implementation of set-based similarity metrics, specifically the Jaccard coefficient, along with skills in feature comparison, deterministic matching, and robust data handling within the Coding & Algorithms and Data Science domains; the level of abstraction is practical application, requiring concrete function implementation rather than only conceptual reasoning. Such problems are commonly asked to assess a candidate's ability to compute accurate similarity scores, manage edge cases and empty inputs, and apply threshold-based matching for tasks like identifying likely pirated custom themes.

Related Interview Questions

  • Compute Jaccard Similarity for Lists - Shopify (medium)
  • Implement URL Shortening Codec - Shopify (medium)
  • Simulate a rover fleet - Shopify (medium)
  • Simulate robot moves on a grid - Shopify (medium)
  • Implement a Capacity-Bounded Cache - Shopify (medium)
Shopify logo
Shopify
May 23, 2026, 12:00 AM
Data Scientist
Technical Screen
Coding & Algorithms
0
0

Implement Python functions to compare theme similarity using the Jaccard coefficient.

Part 1: Basic Jaccard similarity

Given two lists of strings, implement a function that returns their Jaccard similarity:

Jaccard(A, B) = size(intersection(A, B)) / size(union(A, B))

Requirements:

  • Return a float in [0, 1] .
  • Treat the input lists as sets, so duplicate strings should not affect the score.
  • If both lists are empty, return 1.0 .
  • If only one list is empty, return 0.0 .

Example:

list_1 = ["a", "b", "c", "d"]
list_2 = ["b", "c", "d", "e"]

The expected score is 3 / 5 = 0.6.

Part 2: Identify likely pirated custom themes

You are given two lists of dictionaries:

pirated_themes = [
    {"theme_id": "p1", "features": ["a", "b", "c"]},
    {"theme_id": "p2", "features": ["x", "y", "z"]}
]

custom_themes = [
    {"theme_id": "c1", "features": ["a", "b", "d"]},
    {"theme_id": "c2", "features": ["x", "y", "z"]}
]

Each theme dictionary contains:

  • theme_id : a unique theme identifier.
  • features : a list of strings representing extracted theme attributes, assets, file hashes, CSS classes, or other comparable signals.

Implement a function:

def find_likely_pirated_themes(pirated_themes, custom_themes, threshold=0.8):
    ...

For each custom theme:

  1. Compare it with every known pirated theme using Jaccard similarity over features .
  2. Find the best matching pirated theme.
  3. Return custom themes whose best similarity score is greater than or equal to threshold .

The returned result should include, for each flagged custom theme:

  • custom_theme_id
  • matched_pirated_theme_id
  • similarity_score

Handle missing or empty features lists gracefully, do not mutate the input objects, and make the output deterministic.

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Shopify•More Data Scientist•Shopify Data Scientist•Shopify Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.