This question evaluates proficiency in string parsing and compression techniques, specifically run-length encoding/decoding, pattern detection, and algorithmic optimization for producing minimal-length encodings under a maximum pattern length constraint.
You are given a run-length encoding (RLE) format that compresses a string by grouping consecutive repetitions of a short substring.
An encoded string is a concatenation of tokens. Each token has the form:
<count><pattern>
count
is a positive integer written in base-10 (may be multiple digits, no
+
sign).
pattern
is a non-empty string of lowercase letters
a-z
.
pattern
repeated
count
times.
Tokens are concatenated directly. Since digits appear only in count and pattern contains only letters, boundaries are determined by digit vs. letter transitions.
"2aab3ab" decodes to "aab" + "aab" + "ab" + "ab" + "ab" = "aabaabababab".
Implement:
decode(encoded: str) -> str
Given a valid encoded string, return the decoded original string.
Implement:
encode(s: str, K: int) -> str
You must produce an encoded string that decodes to exactly s, under the constraint:
pattern
used in any token must have length
≤ K
.
Among all valid encodings, return one with minimum total encoded length (fewest characters in the encoded string). If multiple shortest encodings exist, you may return any one.
For s = "aabaabababab", K = 3, one shortest encoding is:
"2aab3ab"
because it represents 2 * "aab" followed by 3 * "ab", and each pattern length is ≤ 3.
s
contains only lowercase letters
a-z
.
1 ≤ K ≤ len(s)
.
count
can be larger than 9 (multi-digit).