PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of similarity metrics (Jaccard similarity), set-based data representations, and the competency to implement and reason about list- or token-level similarity comparisons.

  • Medium
  • Shopify
  • Coding & Algorithms
  • Data Scientist

Identify Pirate-Themed Custom Themes Using Jaccard Similarity

Company: Shopify

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Scenario Engineering receives two lists and must identify custom themes that look like Pirate themes. ##### Question a) Write a Python function that computes the Jaccard similarity len(intersection)/len(union) between any two lists. b) Given pirate_themes (list[dict]) and custom_themes (list[dict]), return custom themes whose similarity to at least one pirate theme exceeds a chosen threshold. c) Describe how you would handle edge cases and scale this computation to millions of themes. ##### Hints Use set operations; handle empty lists; consider MinHash or locality-sensitive hashing for scale.

Quick Answer: This question evaluates understanding of similarity metrics (Jaccard similarity), set-based data representations, and the competency to implement and reason about list- or token-level similarity comparisons.

You are given two lists: pirate_themes and custom_themes. Each element is a dict with keys 'name' (string) and 'tags' (list of strings). Define the Jaccard similarity between two tag lists as |intersection| / |union| using set semantics (duplicates ignored). If both tag lists are empty, define similarity as 1.0. Return the list of names of custom themes whose similarity to at least one pirate theme is greater than or equal to threshold. Preserve the order of custom_themes in the output. Tag comparison is case-sensitive. If pirate_themes is empty, return an empty list.

Constraints

  • 0 <= len(pirate_themes), len(custom_themes) <= 10^4
  • Each theme dict contains keys: 'name' (unique string) and 'tags' (list of strings)
  • 0 <= len(tags) <= 10^3 per theme; duplicates in 'tags' are ignored
  • 0.0 <= threshold <= 1.0
  • Tag comparison is case-sensitive
  • Return names in the same order as they appear in custom_themes

Hints

  1. Convert each theme's tags list to a set to ignore duplicates.
  2. Jaccard similarity is len(A ∩ B) / len(A ∪ B); if the union is empty, treat similarity as 1.0.
  3. Precompute sets for pirate themes to avoid repeated conversions.
  4. Short-circuit once any pirate theme meets the threshold for a custom theme.
  5. For large-scale data, consider MinHash/LSH or inverted-index filtering to reduce pairwise comparisons.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Grid Robot Command Simulator - Shopify (medium)
  • Compute Theme Similarity - Shopify (medium)
  • Compute Jaccard Similarity for Lists - Shopify (medium)
  • Implement URL Shortening Codec - Shopify (medium)
  • Simulate a rover fleet - Shopify (medium)