PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of Jaccard similarity and set-based operations, assessing competency in deduplicating lists, precise string comparisons (case-sensitivity and whitespace), and handling edge cases.

  • medium
  • Shopify
  • Coding & Algorithms
  • Data Scientist

Compute Jaccard Similarity for Lists

Company: Shopify

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Implement a Python function to compute the Jaccard similarity between two lists of strings. ```python def jaccard_similarity(a: list[str], b: list[str]) -> float: pass ``` ### Requirements - Treat each input list as a set of unique strings; duplicate strings within the same list should not affect the result. - Jaccard similarity is defined as: `size(intersection of unique strings) / size(union of unique strings)` - String comparison is case-sensitive, and whitespace is significant. - If both lists are empty, return `1.0`. - If exactly one list is empty, return `0.0`. - Do not use third-party libraries. - Aim for `O(n + m)` time complexity, where `n` and `m` are the lengths of the two input lists. ### Examples ```python jaccard_similarity(["red", "blue", "blue"], ["blue", "green"]) == 1 / 3 jaccard_similarity(["a", "b"], ["a", "b"]) == 1.0 jaccard_similarity([], ["a"]) == 0.0 jaccard_similarity([], []) == 1.0 ```

Quick Answer: This question evaluates understanding of Jaccard similarity and set-based operations, assessing competency in deduplicating lists, precise string comparisons (case-sensitivity and whitespace), and handling edge cases.

Write a function that computes the Jaccard similarity between two lists of strings. Treat each list as a set of unique strings, so duplicate values within the same list do not change the result. The Jaccard similarity is defined as the size of the intersection of the unique strings divided by the size of the union of the unique strings. String comparison is case-sensitive, and whitespace is significant. If both lists are empty, return 1.0. If exactly one list is empty, return 0.0. Your goal is to solve this in O(n + m) average time, where n and m are the lengths of the two input lists.

Constraints

  • 0 <= len(a), len(b) <= 100000
  • Each element of `a` and `b` is a string
  • Duplicate strings within the same list should be ignored when computing similarity
  • String comparison is case-sensitive, and whitespace is significant
  • Do not use third-party libraries

Examples

Input: (["red", "blue", "blue"], ["blue", "green"])

Expected Output: 0.3333333333333333

Explanation: The unique sets are {"red", "blue"} and {"blue", "green"}. Their intersection has size 1 and their union has size 3, so the result is 1/3.

Input: (["a", "b"], ["a", "b"])

Expected Output: 1.0

Explanation: Both lists contain the same unique strings, so the intersection and union both have size 2.

Input: ([], ["a"])

Expected Output: 0.0

Explanation: Exactly one list is empty, so the similarity is defined as 0.0.

Input: ([], [])

Expected Output: 1.0

Explanation: Both lists are empty, so the similarity is defined as 1.0.

Input: (["A", "a", "x "], ["a", "x"])

Expected Output: 0.25

Explanation: Comparison is case-sensitive and whitespace is significant. The unique sets are {"A", "a", "x "} and {"a", "x"}. Their intersection has size 1 and union has size 4, so the result is 1/4.

Hints

  1. Convert each list into a set first so duplicates do not affect the answer.
  2. Be careful with the special case where both sets are empty, because the union size would be 0.
Last updated: May 7, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Compute Theme Similarity - Shopify (medium)
  • Implement URL Shortening Codec - Shopify (medium)
  • Simulate a rover fleet - Shopify (medium)
  • Simulate robot moves on a grid - Shopify (medium)
  • Implement a Capacity-Bounded Cache - Shopify (medium)