Implement a Unique Lines Command
Company: Vanta
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Onsite
Quick Answer: This question evaluates mastery of file I/O, streaming algorithms, stateful line processing, and memory-efficient handling of large inputs, situating it within the Coding & Algorithms domain focused on file/stream and string-processing tasks.
Part 1: Collapse Consecutive Duplicate Lines
Constraints
- 0 <= len(lines) <= 200000
- Each line is a string and may be empty.
- The input for this part is assumed to fit in memory.
- Only adjacent equal lines should be collapsed; do not remove all duplicates globally.
Examples
Input: ['apple', 'apple', 'banana', 'apple', 'apple', 'carrot', 'carrot']
Expected Output: ['apple', 'banana', 'apple', 'carrot']
Explanation: The first two apples collapse to one, the later two apples collapse to one, and the carrots collapse to one.
Input: []
Expected Output: []
Explanation: An empty file produces an empty output.
Input: ['only']
Expected Output: ['only']
Explanation: A single line is already unique with respect to consecutive duplicates.
Input: ['', '', 'a', '', '']
Expected Output: ['', 'a', '']
Explanation: Empty lines are valid lines and consecutive empty lines should also be collapsed.
Input: ['x', 'y', 'x', 'y']
Expected Output: ['x', 'y', 'x', 'y']
Explanation: The repeated values are not consecutive, so nothing is removed.
Hints
- You only need to remember the previous line, not every line that has appeared.
- Be careful not to remove non-consecutive duplicates.
Part 2: Streaming Consecutive Line Deduplication
Constraints
- The input iterable may contain 0 to 1000000 lines.
- The iterable may be single-pass and should not be rewound.
- Each line is compared exactly as provided, including any trailing newline characters.
- The returned output is assumed to fit in memory for the tests; in a real command it would be written to an output stream.
Examples
Input: (['apple\n', 'apple\n', 'banana\n', 'banana\n', 'apple\n'],)
Expected Output: ['apple\n', 'banana\n', 'apple\n']
Explanation: The second apple and second banana are consecutive duplicates and are removed. The final apple is kept because it is not consecutive with the first apple.
Input: (['a\n', 'b\n', 'c\n'],)
Expected Output: ['a\n', 'b\n', 'c\n']
Explanation: There are no consecutive duplicate lines, so all lines are returned.
Input: ('only line\n',)
Expected Output: ['only line\n']
Explanation: A single string input is treated as one complete line, not as an iterable of characters.
Input: ([],)
Expected Output: []
Explanation: An empty input produces an empty output.
Input: (['\n', '\n', 'text\n', '\n', '\n'],)
Expected Output: ['\n', 'text\n', '\n']
Explanation: Blank lines are valid lines. Consecutive duplicate blank lines are collapsed, but the blank line after text is kept.
Input: (['same', 'same\n', 'same\n', 'same '],)
Expected Output: ['same', 'same\n', 'same ']
Explanation: Comparison is exact: missing newline and trailing space make lines different. Only the repeated 'same\n' is removed.
Input: (['x\n', 'x\n', 'x\n'],)
Expected Output: ['x\n']
Explanation: A run of identical consecutive lines is reduced to its first line.
Hints
- Do not convert the iterable to a list before processing it.
- Keep only the previous line and emit the current line when it differs.
Part 3: Memory-Limited Uniq for Very Long Chunked Lines
Constraints
- 0 <= number of lines <= 200000
- Each line contains 0 or more string chunks.
- The logical line content is the concatenation of its chunks.
- Do not concatenate all chunks of a line into one large string.
- SHA-256 collisions are ignored for this coding problem.
Examples
Input: [['app', 'le'], ['apple'], ['ban', 'ana'], ['ban', 'ana'], ['app', 'le']]
Expected Output: [['app', 'le'], ['ban', 'ana'], ['app', 'le']]
Explanation: The first two logical lines are both apple, so only the first representation is kept.
Input: []
Expected Output: []
Explanation: No chunked lines means no output.
Input: [[], [], ['x']]
Expected Output: [[], ['x']]
Explanation: An empty chunk list represents an empty line, and consecutive empty lines collapse.
Input: [['ab', 'c'], ['a', 'bc'], ['a', 'b', 'd'], ['abd'], ['ab', 'c']]
Expected Output: [['ab', 'c'], ['a', 'b', 'd'], ['ab', 'c']]
Explanation: abc collapses with abc, abd collapses with abd, and the final abc remains because it is not consecutive with the first group.
Input: [[''], ['', ''], ['z'], ['', 'z']]
Expected Output: [[''], ['z']]
Explanation: Empty chunks do not change the logical content, so the first two lines are both empty and the last two are both z.
Hints
- Chunk boundaries should not affect equality: ['ab', 'c'] and ['a', 'bc'] represent the same line.
- Update a hash object with each chunk and compare the final digest and total byte length with the previous line.
Part 4: Count Consecutive Duplicate Lines
Constraints
- 0 <= len(lines) <= 200000
- Each line is a string and may be empty.
- Only consecutive equal lines belong to the same count group.
- Counts fit in a standard integer.
Examples
Input: ['apple', 'apple', 'banana', 'apple', 'apple', 'carrot', 'carrot']
Expected Output: [[2, 'apple'], [1, 'banana'], [2, 'apple'], [2, 'carrot']]
Explanation: Each consecutive run is returned with its count.
Input: []
Expected Output: []
Explanation: An empty input has no groups.
Input: ['single']
Expected Output: [[1, 'single']]
Explanation: A single line forms one group with count 1.
Input: ['x', 'x', 'x', 'x']
Expected Output: [[4, 'x']]
Explanation: All lines are the same and consecutive, so there is one group.
Input: ['', '', 'a', '', 'b', 'b']
Expected Output: [[2, ''], [1, 'a'], [1, ''], [2, 'b']]
Explanation: Empty lines are counted like any other line, and the later empty line is a separate group.
Hints
- This is run-length encoding over lines.
- When the current line changes, append the previous line's count and reset the counter.