Find longest substring with at least k repeats
Company: C3 AI
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Hard
Interview Round: Technical Screen
# Find longest substring with at least k repeats
### Longest substring where each character appears at least *k* times
You are given:
- a string `s` consisting of lowercase English letters
- an integer `k`
Return the **length of the longest substring** of `s` such that **every distinct character in that substring appears at least `k` times** within the substring.
If no such substring exists, return `-1`.
#### Notes / assumptions
- A substring is contiguous.
- If `k <= 1`, the answer is `len(s)`.
- You should handle edge cases like empty string, `k > len(s)`, and repeated characters.
#### Example
- `s = "aaabb", k = 3` → longest valid substring is `"aaa"` → return `3`.
### Constraints & Assumptions
- Preserve the scope, facts, inputs, and requested outputs from the prompt above.
- If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
- Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
### Clarifying Questions to Ask
- Clarify input sizes, value ranges, mutability, return format, and tie-breaking.
- State the target time and space complexity before coding.
- Call out edge cases such as empty inputs, duplicates, invalid values, overflow, and boundary sizes.
### What a Strong Answer Covers
- A clear algorithm with the right data structures and enough pseudocode or code-level detail to implement it.
- A correctness argument that explains why the algorithm covers all required cases.
- Time and space complexity, plus at least one alternative approach when relevant.
- Focused tests for normal cases, edge cases, and failure modes.
### Follow-up Questions
- How would the approach change if the input were streaming or too large for memory?
- What invariants would you assert in production code?
- Which tests would catch off-by-one, duplicate, or tie-breaking bugs?
Quick Answer: This interview question evaluates algorithm design, data structures, correctness, complexity, edge cases, and implementation details in a realistic interview setting. A strong answer for Find longest substring with at least k repeats states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.
Return the length of the longest substring where every distinct character appears at least k times, or -1 if none exists.
Constraints
- Inputs are Python literals matching the function signature.
- Return a deterministic exact-match value.
Examples
Input: ('aaabb', 3)
Expected Output: 3
Explanation: Prompt example.
Input: ('ababbc', 2)
Expected Output: 5
Explanation: Split around low-count c.
Input: ('', 2)
Expected Output: -1
Explanation: Empty string.
Input: ('abc', 1)
Expected Output: 3
Explanation: k <= 1 returns length.
Hints
- Choose a representation that makes the requested operation direct.
- Handle empty inputs and boundary cases first.