You are given a run-length encoding (RLE) format that compresses a string by grouping consecutive repetitions of a short substring.
Encoding format
An encoded string is a concatenation of tokens. Each token has the form:
-
<count><pattern>
-
count
is a positive integer written in base-10 (may be multiple digits, no
+
sign).
-
pattern
is a non-empty string of lowercase letters
a-z
.
-
The decoded expansion of the token is
pattern
repeated
count
times.
Tokens are concatenated directly. Since digits appear only in count and pattern contains only letters, boundaries are determined by digit vs. letter transitions.
Example
"2aab3ab" decodes to "aab" + "aab" + "ab" + "ab" + "ab" = "aabaabababab".
Task A — Decode
Implement:
-
decode(encoded: str) -> str
Given a valid encoded string, return the decoded original string.
Task B — Encode (optimized)
Implement:
-
encode(s: str, K: int) -> str
You must produce an encoded string that decodes to exactly s, under the constraint:
-
Every
pattern
used in any token must have length
≤ K
.
Among all valid encodings, return one with minimum total encoded length (fewest characters in the encoded string). If multiple shortest encodings exist, you may return any one.
Example
For s = "aabaabababab", K = 3, one shortest encoding is:
because it represents 2 * "aab" followed by 3 * "ab", and each pattern length is ≤ 3.
Notes / Constraints
-
s
contains only lowercase letters
a-z
.
-
1 ≤ K ≤ len(s)
.
-
count
can be larger than 9 (multi-digit).
-
Your encoding must use only the token format above; i.e., you cannot emit raw uncounted literals.