PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of similarity metrics and competency in comparing string collections, reflecting knowledge of set-theoretic concepts and manipulation of collection data structures used in data processing.

  • Medium
  • Shopify
  • Coding & Algorithms
  • Data Scientist

Calculate Jaccard Similarity Score for Two String Lists

Company: Shopify

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

##### Scenario During pair programming, the interviewer asks you to compare two lists of merchant tags and quantify their similarity. ##### Question Implement a Python function that takes two lists of strings and returns their Jaccard similarity score. ##### Hints Convert both lists to sets, then compute |A ∩ B| ÷ |A ∪ B|.

Quick Answer: This question evaluates understanding of similarity metrics and competency in comparing string collections, reflecting knowledge of set-theoretic concepts and manipulation of collection data structures used in data processing.

During pair programming, the interviewer asks you to compare two lists of merchant tags and quantify their similarity. Implement a function `jaccardSimilarity(listA, listB)` that takes two lists of strings and returns their **Jaccard similarity score**: the size of the intersection of the two sets of tags divided by the size of their union. Formally, for sets A and B derived from the input lists: Jaccard(A, B) = |A ∩ B| / |A ∪ B| Duplicate tags within a single list should be treated as one (the lists are converted to sets). If both lists are empty (the union is empty), return 0.0 to avoid dividing by zero. Example: listA = ["a", "b", "c"], listB = ["b", "c", "d"] Intersection = {"b", "c"} (size 2), Union = {"a", "b", "c", "d"} (size 4) Result = 2 / 4 = 0.5

Constraints

  • 0 <= len(listA), len(listB) <= 10^5
  • Each tag is a non-empty string of printable ASCII characters.
  • Tags are case-sensitive ("Sale" and "sale" are distinct).
  • Duplicate tags within the same list count once.
  • If both lists are empty (union is empty), return 0.0.

Examples

Input: (['a', 'b', 'c'], ['b', 'c', 'd'])

Expected Output: 0.5

Explanation: Intersection {b, c} has size 2; union {a, b, c, d} has size 4; 2/4 = 0.5.

Input: (['shoes', 'apparel', 'sale'], ['shoes', 'apparel', 'sale'])

Expected Output: 1.0

Explanation: Identical tag sets: intersection equals union, so the score is 1.0.

Input: (['x', 'y'], ['p', 'q'])

Expected Output: 0.0

Explanation: No shared tags: intersection is empty (size 0), so the score is 0.0.

Input: ([], [])

Expected Output: 0.0

Explanation: Both lists empty: the union is empty, so we return 0.0 to avoid dividing by zero.

Input: (['a', 'a', 'b'], ['a', 'b', 'b'])

Expected Output: 1.0

Explanation: Duplicates collapse: both become {a, b}; intersection equals union, score 1.0.

Input: (['a', 'b', 'c', 'd'], ['c', 'd'])

Expected Output: 0.5

Explanation: Second list is a subset: intersection {c, d} size 2, union {a, b, c, d} size 4; 2/4 = 0.5.

Input: (['tag1'], [])

Expected Output: 0.0

Explanation: One empty list: intersection is empty, union is {tag1} size 1; 0/1 = 0.0.

Hints

  1. Convert both lists to sets to discard duplicates and enable fast membership tests.
  2. The intersection size is |A ∩ B| and the union size is |A ∪ B|; the score is their ratio.
  3. Guard against division by zero: when the union is empty (both lists empty), return 0.0.
Last updated: Jun 25, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Grid Robot Command Simulator - Shopify (medium)
  • Compute Theme Similarity - Shopify (medium)
  • Compute Jaccard Similarity for Lists - Shopify (medium)
  • Implement URL Shortening Codec - Shopify (medium)
  • Simulate a rover fleet - Shopify (medium)