Implement two functions: (1) jaccard_similarity(s1, s2) that tokenizes by splitting on any non-alphanumeric character, lowercases, treats tokens as sets (duplicates ignored), and returns |A ∩ B| / |A ∪ B| with 0.0 when the union is empty; (2) rank_by_jaccard(reviews, target) that returns the reviews sorted descending by similarity to target, stable on ties by original index. Discuss how you would handle very long inputs (streaming/iterators), Unicode and punctuation edge cases, and how to achieve O(T) expected time where T is total tokens. Finally, extend rank_by_jaccard to return the top-k efficiently for k ≪ n without computing full similarities for all n reviews; justify your data structures and complexity.