Find shortest substring with n unique letters
Company: Microsoft
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates proficiency in string processing, frequency management, and algorithmic optimization for locating constrained contiguous substrings. Commonly asked in the coding & algorithms domain because it reveals how candidates reason about character distinctness and efficiency trade-offs, it targets practical application of algorithmic techniques rather than purely conceptual theory.
Constraints
- 1 ≤ n ≤ 26
- 0 ≤ |s| ≤ 10^5
- s consists only of lowercase English letters
- Return -1 when no substring has exactly n distinct characters
Examples
Input: ("aabcbcdbca", 3)
Expected Output: 3
Explanation: The substring "bca" (and others like "cbd"/"dbc") has exactly 3 distinct characters with length 3, which is the shortest possible.
Input: ("aaaa", 1)
Expected Output: 1
Explanation: A single 'a' is a substring with exactly 1 distinct character, length 1.
Input: ("abc", 4)
Expected Output: -1
Explanation: The string has only 3 distinct characters, so no substring can contain exactly 4.
Input: ("", 1)
Expected Output: -1
Explanation: An empty string has no substrings, so there is none with exactly 1 distinct character.
Input: ("abcabc", 3)
Expected Output: 3
Explanation: "abc" already has exactly 3 distinct characters; no shorter substring can reach 3 distinct chars.
Input: ("aaabbbccc", 2)
Expected Output: 2
Explanation: The transition "ab" or "bc" gives exactly 2 distinct characters in length 2, the minimum possible.
Input: ("xyzzyx", 2)
Expected Output: 2
Explanation: Adjacent differing pairs like "xy", "yz", or "zy" have exactly 2 distinct characters with length 2.
Hints
- A sliding window over s lets you track the count of distinct characters as you move the right edge.
- Whenever the window contains more than n distinct characters, advance the left edge and decrement counts until you are back to at most n.
- When the window has exactly n distinct characters, you can still shrink it: drop leading characters whose count is greater than 1, since removing them does not reduce the distinct count.
- If you never reach a window with exactly n distinct characters (e.g. the string has fewer than n distinct letters), the answer is -1.