How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Shopify.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Shopify during technical interviews.

Calculate Jaccard Similarity Score for Two String Lists

Company: Shopify

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

##### Scenario During pair programming, the interviewer asks you to compare two lists of merchant tags and quantify their similarity. ##### Question Implement a Python function that takes two lists of strings and returns their Jaccard similarity score. ##### Hints Convert both lists to sets, then compute |A ∩ B| ÷ |A ∪ B|.

Quick Answer: This question evaluates understanding of similarity metrics and competency in comparing string collections, reflecting knowledge of set-theoretic concepts and manipulation of collection data structures used in data processing.

During pair programming, the interviewer asks you to compare two lists of merchant tags and quantify their similarity. Implement a function `jaccardSimilarity(listA, listB)` that takes two lists of strings and returns their **Jaccard similarity score**: the size of the intersection of the two sets of tags divided by the size of their union. Formally, for sets A and B derived from the input lists: Jaccard(A, B) = |A ∩ B| / |A ∪ B| Duplicate tags within a single list should be treated as one (the lists are converted to sets). If both lists are empty (the union is empty), return 0.0 to avoid dividing by zero. Example: listA = ["a", "b", "c"], listB = ["b", "c", "d"] Intersection = {"b", "c"} (size 2), Union = {"a", "b", "c", "d"} (size 4) Result = 2 / 4 = 0.5

Constraints

0 <= len(listA), len(listB) <= 10^5
Each tag is a non-empty string of printable ASCII characters.
Tags are case-sensitive ("Sale" and "sale" are distinct).
Duplicate tags within the same list count once.
If both lists are empty (union is empty), return 0.0.

Examples

Input: (['a', 'b', 'c'], ['b', 'c', 'd'])

Expected Output: 0.5

Explanation: Intersection {b, c} has size 2; union {a, b, c, d} has size 4; 2/4 = 0.5.

Input: (['shoes', 'apparel', 'sale'], ['shoes', 'apparel', 'sale'])

Expected Output: 1.0

Explanation: Identical tag sets: intersection equals union, so the score is 1.0.

Input: (['x', 'y'], ['p', 'q'])

Expected Output: 0.0

Explanation: No shared tags: intersection is empty (size 0), so the score is 0.0.

Input: ([], [])

Expected Output: 0.0

Explanation: Both lists empty: the union is empty, so we return 0.0 to avoid dividing by zero.

Input: (['a', 'a', 'b'], ['a', 'b', 'b'])

Expected Output: 1.0

Explanation: Duplicates collapse: both become {a, b}; intersection equals union, score 1.0.

Input: (['a', 'b', 'c', 'd'], ['c', 'd'])

Expected Output: 0.5

Explanation: Second list is a subset: intersection {c, d} size 2, union {a, b, c, d} size 4; 2/4 = 0.5.

Input: (['tag1'], [])

Expected Output: 0.0

Explanation: One empty list: intersection is empty, union is {tag1} size 1; 0/1 = 0.0.

Hints

Convert both lists to sets to discard duplicates and enable fast membership tests.
The intersection size is |A ∩ B| and the union size is |A ∪ B|; the score is their ratio.
Guard against division by zero: when the union is empty (both lists empty), return 0.0.

Quick Overview

This question evaluates understanding of similarity metrics and competency in comparing string collections, reflecting knowledge of set-theoretic concepts and manipulation of collection data structures used in data processing.