PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Moveworks

Compute Jaccard similarity between two strings

Last updated: Mar 29, 2026

Quick Overview

This question evaluates string tokenization, set-based similarity metrics (Jaccard index) and basic set operations, testing competency in text processing and algorithmic reasoning within the Coding & Algorithms domain and requiring practical application of these concepts.

  • medium
  • Moveworks
  • Coding & Algorithms
  • Software Engineer

Compute Jaccard similarity between two strings

Company: Moveworks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Jaccard Similarity of Two Strings Given two strings `a` and `b`, compute their **Jaccard similarity** based on token sets. ### Tokenization rules - Convert to lowercase. - Split on any non-alphabetic character (e.g., spaces, punctuation). - Discard empty tokens. - Treat each string as a **set** of unique tokens (ignore duplicates). ### Jaccard similarity Let `A` be the token set from `a` and `B` from `b`. \[ J(A,B) = \frac{|A \cap B|}{|A \cup B|} \] ### Output Return the similarity as a floating-point number. ### Edge cases - If both sets are empty, define similarity as `1.0`. ### Example - `a = "I like coffee, coffee"` - `b = "coffee is great"` - `A = {i, like, coffee}` - `B = {coffee, is, great}` - Intersection size = 1, Union size = 5 → similarity = `0.2`

Quick Answer: This question evaluates string tokenization, set-based similarity metrics (Jaccard index) and basic set operations, testing competency in text processing and algorithmic reasoning within the Coding & Algorithms domain and requiring practical application of these concepts.

Related Interview Questions

  • Find a secret word via match feedback - Moveworks (medium)
  • Select next Hangman letter - Moveworks (Medium)
Moveworks logo
Moveworks
Dec 15, 2025, 12:00 AM
Software Engineer
Onsite
Coding & Algorithms
7
0

Jaccard Similarity of Two Strings

Given two strings a and b, compute their Jaccard similarity based on token sets.

Tokenization rules

  • Convert to lowercase.
  • Split on any non-alphabetic character (e.g., spaces, punctuation).
  • Discard empty tokens.
  • Treat each string as a set of unique tokens (ignore duplicates).

Jaccard similarity

Let A be the token set from a and B from b.

J(A,B)=∣A∩B∣∣A∪B∣J(A,B) = \frac{|A \cap B|}{|A \cup B|}J(A,B)=∣A∪B∣∣A∩B∣​

Output

Return the similarity as a floating-point number.

Edge cases

  • If both sets are empty, define similarity as 1.0 .

Example

  • a = "I like coffee, coffee"
  • b = "coffee is great"
  • A = {i, like, coffee}
  • B = {coffee, is, great}
  • Intersection size = 1, Union size = 5 → similarity = 0.2

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Moveworks•More Software Engineer•Moveworks Software Engineer•Moveworks Coding & Algorithms•Software Engineer Coding & Algorithms
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.