Part A — Anagram checker:
Write a function is_anagram(a: str, b: str, locale: str = 'en') -> bool that returns True iff a and b are anagrams under the following rules:
-
Ignore case, whitespace, and Unicode punctuation.
-
Normalize Unicode using NFKD; when locale='en', strip combining marks so accented letters compare equal to their base letters (e.g., 'é' -> 'e').
-
Consider only letters and digits after normalization.
-
Must work for inputs up to 10^7 characters each with O(σ) additional memory where σ is the alphabet size under the rules above, and O(n) time. Streaming solutions that do a single pass over each string are preferred; do not materialize full normalized strings.
-
Return False immediately if, after filtering, the multiset lengths differ.
Follow-ups:
-
How would you adapt it to be locale-aware where 'ß' in German equals 'ss'? 2) How to handle right-to-left scripts without breaking normalization?
Part B — Stable de-duplication:
Write a function unique_stable(iterable, key=None, treat_nan_as_equal=False) that returns a lazy generator yielding the first occurrence of each distinct element while preserving original order.
-
If key is None, distinctness is by the element itself (assume elements are hashable). If key is provided, distinctness is by key(x).
-
If treat_nan_as_equal=True, treat all NaN floats as equal for distinctness even though NaN != NaN in Python.
-
Target O(n) time and O(m) space where m is the number of distinct keys seen. The implementation must be lazy (streaming) over the input and must not sort.
Provide tight big-O bounds, and brief tests covering edge cases like empty input, all-duplicates, Unicode strings, very large inputs, and NaN handling.