Implement and use a version comparator
Company: Nextdoor
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: Implement and use a version comparator evaluates algorithm design, data structures, correctness, complexity, edge cases, and implementation details in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.
Constraints
- 0 <= number of identifiers <= 100000
- Each identifier is a non-empty string of numeric segments separated by '.', optionally followed by a '-' pre-release tag and/or a '+' build-metadata suffix.
- Numeric segments fit in a 64-bit integer; leading zeros are allowed.
- Pre-release names considered: alpha, beta, rc (with an optional trailing integer); any other tag is treated as ranking after rc.
- Sort must be stable: equal identifiers keep their original relative order.
Examples
Input: (["1.0", "01.2.0", "2.0.0-alpha", "1.10.3"],)
Expected Output: ['1.0', '01.2.0', '1.10.3', '2.0.0-alpha']
Explanation: Numeric cores: 1.0 < 1.2.0 (01 == 1) < 1.10.3 < 2.0.0. The pre-release '-alpha' on 2.0.0 only matters relative to a 2.0.0 release, which is absent, so it stays last.
Input: (["1.0.0", "1.0.0-alpha", "1.0.0-beta", "1.0.0-rc1"],)
Expected Output: ['1.0.0-alpha', '1.0.0-beta', '1.0.0-rc1', '1.0.0']
Explanation: Same numeric core 1.0.0. Pre-releases sort before the release in the order alpha < beta < rc, and the plain '1.0.0' release comes last.
Input: (["1.0.0+build5", "1.0.0+build1", "1.0.0"],)
Expected Output: ['1.0.0+build5', '1.0.0+build1', '1.0.0']
Explanation: Build metadata after '+' is ignored for ordering, so all three compare equal; the stable sort preserves their original input order.
Input: (["1", "1.0", "1.0.0"],)
Expected Output: ['1', '1.0', '1.0.0']
Explanation: Missing trailing segments are treated as 0, so all three are numerically equal; stable sort keeps input order.
Input: (["2.0.0-rc2", "2.0.0-rc10", "2.0.0-rc1"],)
Expected Output: ['2.0.0-rc1', '2.0.0-rc2', '2.0.0-rc10']
Explanation: Within rc tags the trailing number is compared as an integer, so rc1 < rc2 < rc10 (not lexicographically, where rc10 would precede rc2).
Input: ([],)
Expected Output: []
Explanation: Empty input returns an empty list.
Input: (["1.10.3", "1.2.0", "1.9.9"],)
Expected Output: ['1.2.0', '1.9.9', '1.10.3']
Explanation: Second segment compared as integers: 2 < 9 < 10, so 1.2.0 < 1.9.9 < 1.10.3 (lexicographic string compare would wrongly put 1.10.3 first).
Hints
- Strip build metadata (everything from '+' onward) first; it never affects ordering.
- Split the numeric core on '.' and compare segment-by-segment as integers, treating any missing segment in the shorter version as 0. Parsing each segment as an int automatically handles leading zeros (01 -> 1).
- Only after the numeric cores tie does the pre-release tag matter: map it to a sortable key where a missing tag (a real release) ranks ABOVE every pre-release, and alpha < beta < rc, with rc's trailing number compared numerically (rc2 < rc10).
- Use a custom comparator (functools.cmp_to_key in Python, Comparator in Java, a lambda in C++, or Array.prototype.sort's compare fn in JS) and rely on a stable sort for ties.