How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Moveworks.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Moveworks during technical interviews.

Compute Jaccard similarity between two strings

Quick Overview

This question evaluates string tokenization, set-based similarity metrics (Jaccard index) and basic set operations, testing competency in text processing and algorithmic reasoning within the Coding & Algorithms domain and requiring practical application of these concepts.

Company: Moveworks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Jaccard Similarity of Two Strings Given two strings `a` and `b`, compute their **Jaccard similarity** based on token sets. ### Tokenization rules - Convert to lowercase. - Split on any non-alphabetic character (e.g., spaces, punctuation). - Discard empty tokens. - Treat each string as a **set** of unique tokens (ignore duplicates). ### Jaccard similarity Let `A` be the token set from `a` and `B` from `b`. \[ J(A,B) = \frac{|A \cap B|}{|A \cup B|} \] ### Output Return the similarity as a floating-point number. ### Edge cases - If both sets are empty, define similarity as `1.0`. ### Example - `a = "I like coffee, coffee"` - `b = "coffee is great"` - `A = {i, like, coffee}` - `B = {coffee, is, great}` - Intersection size = 1, Union size = 5 → similarity = `0.2`

Quick Answer: This question evaluates string tokenization, set-based similarity metrics (Jaccard index) and basic set operations, testing competency in text processing and algorithmic reasoning within the Coding & Algorithms domain and requiring practical application of these concepts.

Given two strings `a` and `b`, compute their **Jaccard similarity** based on token sets. ### Tokenization rules - Convert to lowercase. - Split on any non-alphabetic character (spaces, punctuation, digits, etc.). - Discard empty tokens. - Treat each string as a **set** of unique tokens (duplicates are ignored). ### Jaccard similarity Let `A` be the token set from `a` and `B` the token set from `b`. ``` J(A, B) = |A ∩ B| / |A ∪ B| ``` Return the similarity as a floating-point number. ### Edge case - If **both** token sets are empty, define the similarity as `1.0`. ### Example - `a = "I like coffee, coffee"` - `b = "coffee is great"` - `A = {i, like, coffee}`, `B = {coffee, is, great}` - Intersection size = 1, Union size = 5 → similarity = `0.2`

Constraints

0 <= len(a), len(b)
Strings may contain letters, digits, punctuation, and whitespace.
Tokenization is case-insensitive; split on any non-alphabetic character.
If both token sets are empty, the result is defined as 1.0.

Examples

Input: ("I like coffee, coffee", "coffee is great")

Expected Output: 0.2

Explanation: A = {i, like, coffee}, B = {coffee, is, great}. Intersection {coffee} = 1, union size = 5 → 1/5 = 0.2.

Input: ("", "")

Expected Output: 1.0

Explanation: Both token sets are empty, so by definition the similarity is 1.0.

Input: ("hello world", "hello world")

Expected Output: 1.0

Explanation: Identical token sets {hello, world}; intersection equals union, so 2/2 = 1.0.

Input: ("apple", "banana")

Expected Output: 0.0

Explanation: Disjoint sets {apple} and {banana}; intersection 0, union 2 → 0.0.

Input: ("The quick brown fox", "the QUICK red fox")

Expected Output: 0.6

Explanation: Case-insensitive: A = {the, quick, brown, fox}, B = {the, quick, red, fox}. Intersection {the, quick, fox} = 3, union size = 5 → 3/5 = 0.6.

Input: ("a.b.c", "b,c,d")

Expected Output: 0.5

Explanation: Punctuation splits tokens: A = {a, b, c}, B = {b, c, d}. Intersection {b, c} = 2, union {a, b, c, d} = 4 → 0.5.

Input: ("only here", "")

Expected Output: 0.0

Explanation: A = {only, here}, B = {}. Intersection 0, union 2 → 0.0. Only one side empty, so not the 1.0 special case.

Input: ("123 456", "hello")

Expected Output: 0.0

Explanation: Digits are non-alphabetic, so "123 456" yields an empty token set. A = {}, B = {hello}; intersection 0, union 1 → 0.0.

Hints

Lowercase each string, then split on runs of non-alphabetic characters and drop empty pieces to build a set of unique tokens.
Use set intersection and union: similarity = |A ∩ B| / |A ∪ B|.
Handle the special case first: if the union is empty (both sets empty), return 1.0 to avoid dividing by zero.

Quick Overview