Design Algorithm for Longest Substring with K Distinct Characters
Company: Upstart
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Scenario
Tech interview round 2 – sliding-window algorithm
##### Question
Design an algorithm that finds the length of the longest substring containing at most K distinct characters in a given string.
##### Hints
Maintain left/right pointers and a hash-map of character counts; shrink window when distinct count exceeds K.
Quick Answer: This question evaluates proficiency in string-processing and sliding-window algorithm patterns, assessing competency in designing efficient solutions and managing data structures to track distinct elements and reason about time and space complexity.
Given a string s and an integer k, return the length of the longest contiguous substring that contains at most k distinct characters. If k is 0 or the string is empty, return 0.
Constraints
- 0 <= len(s) <= 200000
- 0 <= k <= len(s)
- Characters are case-sensitive
- Substring must be contiguous
- Aim for O(n) time and O(min(n, alphabet_size)) space
Solution
def longest_substring_k_distinct(s: str, k: int) -> int:
if k <= 0 or not s:
return 0
left = 0
counts = {}
distinct = 0
best = 0
for right, ch in enumerate(s):
counts[ch] = counts.get(ch, 0) + 1
if counts[ch] == 1:
distinct += 1
while distinct > k:
left_ch = s[left]
counts[left_ch] -= 1
if counts[left_ch] == 0:
distinct -= 1
del counts[left_ch]
left += 1
if right - left + 1 > best:
best = right - left + 1
return best
Explanation
Use a sliding window with a hash map tracking character frequencies in the current window. Expand the right pointer to include new characters. If the number of distinct characters exceeds k, move the left pointer forward, decrementing counts and removing characters whose count drops to zero, until the window satisfies the constraint again. Track the maximum window length throughout.
Time complexity: O(n). Space complexity: O(min(n, alphabet_size)).
Hints
- Use a sliding window with two pointers (left and right).
- Maintain a hash map of character counts within the current window.
- When the number of distinct characters exceeds k, move left forward and decrement counts until it is at most k.
- Update the best length after reestablishing the constraint.