Implement anagram check and stable deduplication

Q: Implement anagram check and stable deduplication

This is a Coding & Algorithms interview question from Google for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Part A — Anagram checker: Write a function is_anagram(a: str, b: str, locale: str = 'en') -> bool that returns True iff a and b are anagrams under the following rules:

Ignore case, whitespace, and Unicode punctuation.
Normalize Unicode using NFKD; when locale='en', strip combining marks so accented letters compare equal to their base letters (e.g., 'é' -> 'e').
Consider only letters and digits after normalization.
Must work for inputs up to 10^7 characters each with O(σ) additional memory where σ is the alphabet size under the rules above, and O(n) time. Streaming solutions that do a single pass over each string are preferred; do not materialize full normalized strings.
Return False immediately if, after filtering, the multiset lengths differ. Follow-ups:

How would you adapt it to be locale-aware where 'ß' in German equals 'ss'? 2) How to handle right-to-left scripts without breaking normalization?

Part B — Stable de-duplication: Write a function unique_stable(iterable, key=None, treat_nan_as_equal=False) that returns a lazy generator yielding the first occurrence of each distinct element while preserving original order.

If key is None, distinctness is by the element itself (assume elements are hashable). If key is provided, distinctness is by key(x).
If treat_nan_as_equal=True, treat all NaN floats as equal for distinctness even though NaN != NaN in Python.
Target O(n) time and O(m) space where m is the number of distinct keys seen. The implementation must be lazy (streaming) over the input and must not sort. Provide tight big-O bounds, and brief tests covering edge cases like empty input, all-duplicates, Unicode strings, very large inputs, and NaN handling.

Implement anagram check and stable deduplication

Comments (0)