Find shortest substring with N unique characters
Company: Meta
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
You are given a string `s` and an integer `n`.
Find the **shortest contiguous substring** of `s` that contains **exactly `n` distinct characters**.
If there is no such substring, return an empty string (or `-1`, depending on how you choose to represent "not found"). If multiple substrings have the same minimal length, returning any one of them is acceptable.
### Requirements
- Input:
- `s`: a non-empty string composed of ASCII letters (you may assume lowercase English letters if you want to simplify).
- `n`: an integer, `1 <= n <= 26` (assuming lowercase letters) and `n <= number of distinct characters in s` if a valid answer exists.
- Output:
- The shortest substring of `s` that contains exactly `n` distinct characters, or an indication that no such substring exists.
### Examples
- Example 1:
- `s = "aabcbc"`, `n = 2`
- Valid substrings with exactly 2 distinct characters include: `"aa"`, `"ab"`, `"bc"`, `"cb"`, etc.
- The shortest length is 2; any of `"aa"`, `"ab"`, `"bc"`, `"cb"` is acceptable.
- Example 2:
- `s = "abac"`, `n = 3`
- Substrings with exactly 3 distinct characters: `"aba"` (a, b), `"abac"` (a, b, c), `"bac"` (b, a, c)
- The shortest is `"bac"` of length 3.
- Example 3:
- `s = "aaaa"`, `n = 2`
- There is no substring with exactly 2 distinct characters; return empty string or `-1`.
### Task
1. Specify the algorithm to solve this problem, including:
- Time and space complexity.
- How you handle edge cases such as `n = 0`, `n = 1`, `n > distinct characters in s`, and very long strings.
2. Then implement the algorithm in a programming language of your choice.
Quick Answer: This question evaluates proficiency in string processing, handling character-uniqueness constraints, and algorithmic thinking about time and space efficiency.
You are given a string `s` and an integer `n`.
Find the **shortest contiguous substring** of `s` that contains **exactly `n` distinct characters**, and return it. If multiple substrings tie for the minimal length, return the **leftmost** one (the one that starts earliest). If no such substring exists, return the empty string `""`.
### Requirements
- `s` is a string of ASCII letters (it may be empty).
- `n` is an integer. If `n <= 0`, or `s` is empty, or there is no substring with exactly `n` distinct characters, return `""`.
- A substring with **exactly** `n` distinct characters means the set of characters appearing in it has size exactly `n` (repeats of the same character do not add to the distinct count).
### Examples
- `s = "aabcbc", n = 2` -> `"ab"` (length-2 windows with exactly 2 distinct: "ab", "bc", "cb"; leftmost is "ab". Note "aa" has only 1 distinct character.)
- `s = "abac", n = 3` -> `"bac"` (shortest window with 3 distinct, length 3)
- `s = "aaaa", n = 2` -> `""` (only 1 distinct character exists)
- `s = "abcde", n = 1` -> `"a"`
### Task
Implement an efficient sliding-window solution. State its time and space complexity and explain how you handle `n = 0`, `n = 1`, `n` greater than the number of distinct characters in `s`, and very long strings.
Constraints
- 0 <= len(s); s consists of ASCII letters (may be empty).
- n may be any integer; n <= 0, empty s, or no valid window all return "".
- If a valid answer exists, 1 <= n <= number of distinct characters in s.
- On ties for minimal length, the leftmost (earliest-starting) substring is returned.
Examples
Input: ("aabcbc", 2)
Expected Output: "ab"
Explanation: Length-2 windows with exactly 2 distinct: "ab", "bc", "cb". Leftmost is "ab". ("aa" has only 1 distinct character.)
Input: ("abac", 3)
Expected Output: "bac"
Explanation: The shortest window containing 3 distinct characters {a,b,c} is "bac" at indices 1..3, length 3.
Input: ("aaaa", 2)
Expected Output: ""
Explanation: s has only 1 distinct character, so no substring has exactly 2 distinct; return empty string.
Input: ("abcde", 1)
Expected Output: "a"
Explanation: Any single character is a length-1 window with exactly 1 distinct; the leftmost is "a".
Input: ("", 1)
Expected Output: ""
Explanation: Empty input string has no substrings; return empty string.
Input: ("aabbcc", 0)
Expected Output: ""
Explanation: n = 0 is not a valid distinct count; return empty string.
Input: ("xyzzyx", 3)
Expected Output: "xyz"
Explanation: The leftmost minimal window with 3 distinct {x,y,z} is the prefix "xyz", length 3.
Input: ("aaab", 5)
Expected Output: ""
Explanation: Only 2 distinct characters exist (a,b), fewer than n=5; no valid window, return empty string.
Hints
- A window has 'exactly n distinct' which is harder than 'at most n'. Maintain a hash map of character counts and a running distinct-count for the current window [left, right].
- Grow the window by moving right one step at a time. Whenever distinct exceeds n, shrink from the left. Also shrink while the leftmost character is redundant (count > 1) so the recorded window is as tight as possible.
- Whenever distinct == n, the current window is a candidate; record it if it is strictly shorter than the best seen so far (strict comparison keeps the leftmost on ties). Handle n <= 0, empty s, and 'no valid window' by returning the empty string.