PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Google

Implement anagram check and stable deduplication

Last updated: Mar 29, 2026

Quick Overview

This question evaluates streaming algorithm design, Unicode normalization and locale-aware text processing, multiset frequency counting under tight time and memory bounds for an anagram checker, and stable, order-preserving deduplication with custom keys and NaN handling; it is in the Coding & Algorithms domain for a Data Scientist position.

  • Medium
  • Google
  • Coding & Algorithms
  • Data Scientist

Implement anagram check and stable deduplication

Company: Google

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Part A — Anagram checker: Write a function is_anagram(a: str, b: str, locale: str = 'en') -> bool that returns True iff a and b are anagrams under the following rules: - Ignore case, whitespace, and Unicode punctuation. - Normalize Unicode using NFKD; when locale='en', strip combining marks so accented letters compare equal to their base letters (e.g., 'é' -> 'e'). - Consider only letters and digits after normalization. - Must work for inputs up to 10^7 characters each with O(σ) additional memory where σ is the alphabet size under the rules above, and O(n) time. Streaming solutions that do a single pass over each string are preferred; do not materialize full normalized strings. - Return False immediately if, after filtering, the multiset lengths differ. Follow-ups: 1) How would you adapt it to be locale-aware where 'ß' in German equals 'ss'? 2) How to handle right-to-left scripts without breaking normalization? Part B — Stable de-duplication: Write a function unique_stable(iterable, key=None, treat_nan_as_equal=False) that returns a lazy generator yielding the first occurrence of each distinct element while preserving original order. - If key is None, distinctness is by the element itself (assume elements are hashable). If key is provided, distinctness is by key(x). - If treat_nan_as_equal=True, treat all NaN floats as equal for distinctness even though NaN != NaN in Python. - Target O(n) time and O(m) space where m is the number of distinct keys seen. The implementation must be lazy (streaming) over the input and must not sort. Provide tight big-O bounds, and brief tests covering edge cases like empty input, all-duplicates, Unicode strings, very large inputs, and NaN handling.

Quick Answer: This question evaluates streaming algorithm design, Unicode normalization and locale-aware text processing, multiset frequency counting under tight time and memory bounds for an anagram checker, and stable, order-preserving deduplication with custom keys and NaN handling; it is in the Coding & Algorithms domain for a Data Scientist position.

Related Interview Questions

  • Solve Rooms and Top-K Streams - Google (medium)
  • Find Containing Range - Google (medium)
  • Rearrange Tasks With Cooldown - Google (medium)
  • Implement Employee Management and Expression Evaluation - Google (medium)
  • Solve Three Array and Matrix Path Problems - Google (medium)
Google logo
Google
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Coding & Algorithms
3
0

Part A — Anagram checker: Write a function is_anagram(a: str, b: str, locale: str = 'en') -> bool that returns True iff a and b are anagrams under the following rules:

  • Ignore case, whitespace, and Unicode punctuation.
  • Normalize Unicode using NFKD; when locale='en', strip combining marks so accented letters compare equal to their base letters (e.g., 'é' -> 'e').
  • Consider only letters and digits after normalization.
  • Must work for inputs up to 10^7 characters each with O(σ) additional memory where σ is the alphabet size under the rules above, and O(n) time. Streaming solutions that do a single pass over each string are preferred; do not materialize full normalized strings.
  • Return False immediately if, after filtering, the multiset lengths differ. Follow-ups:
  1. How would you adapt it to be locale-aware where 'ß' in German equals 'ss'? 2) How to handle right-to-left scripts without breaking normalization?

Part B — Stable de-duplication: Write a function unique_stable(iterable, key=None, treat_nan_as_equal=False) that returns a lazy generator yielding the first occurrence of each distinct element while preserving original order.

  • If key is None, distinctness is by the element itself (assume elements are hashable). If key is provided, distinctness is by key(x).
  • If treat_nan_as_equal=True, treat all NaN floats as equal for distinctness even though NaN != NaN in Python.
  • Target O(n) time and O(m) space where m is the number of distinct keys seen. The implementation must be lazy (streaming) over the input and must not sort. Provide tight big-O bounds, and brief tests covering edge cases like empty input, all-duplicates, Unicode strings, very large inputs, and NaN handling.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Google•More Data Scientist•Google Data Scientist•Google Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.