How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

What difficulty level is this interview question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Yelp.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Yelp during technical interviews.

Compute and Rank by Jaccard Similarity

Last updated: Mar 29, 2026

Quick Overview

The prompt evaluates set-based similarity metrics (Jaccard), robust tokenization and normalization for text data including Unicode and punctuation edge cases, streaming/iterator processing for very long inputs, and algorithmic efficiency for top-k retrieval and stable ranking.

Yelp

Oct 13, 2025, 9:49 PM

Data Scientist

Onsite

Coding & Algorithms

Implement two functions: (1) jaccard_similarity(s1, s2) that tokenizes by splitting on any non-alphanumeric character, lowercases, treats tokens as sets (duplicates ignored), and returns |A ∩ B| / |A ∪ B| with 0.0 when the union is empty; (2) rank_by_jaccard(reviews, target) that returns the reviews sorted descending by similarity to target, stable on ties by original index. Discuss how you would handle very long inputs (streaming/iterators), Unicode and punctuation edge cases, and how to achieve O(T) expected time where T is total tokens. Finally, extend rank_by_jaccard to return the top-k efficiently for k ≪ n without computing full similarities for all n reviews; justify your data structures and complexity.

Submit Your Answer to Earn 20XP

Loading comments...

Browse More Questions

More Coding & Algorithms•More Yelp•More Data Scientist•Yelp Data Scientist•Yelp Coding & Algorithms•Data Scientist Coding & Algorithms