PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates mastery of file I/O, streaming algorithms, stateful line processing, and memory-efficient handling of large inputs, situating it within the Coding & Algorithms domain focused on file/stream and string-processing tasks.

  • easy
  • Vanta
  • Coding & Algorithms
  • Software Engineer

Implement a Unique Lines Command

Company: Vanta

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Onsite

Implement a simplified version of the Unix `uniq` command. Given an input text file, output the file contents with consecutive duplicate lines collapsed into a single line. Example: Input file: ```text apple apple banana apple apple carrot carrot ``` Output: ```text apple banana apple carrot ``` Requirements: - Only consecutive duplicate lines should be collapsed. - Non-consecutive duplicates should remain. - Preserve the original order of lines. - The implementation should work for files that may be too large to fit entirely in memory. Follow-up questions: - How would you implement this using streaming file I/O? - How would you handle very long lines or memory limits? - How would you extend the command to support counting repeated consecutive lines, similar to `uniq -c`?

Quick Answer: This question evaluates mastery of file I/O, streaming algorithms, stateful line processing, and memory-efficient handling of large inputs, situating it within the Coding & Algorithms domain focused on file/stream and string-processing tasks.

Part 1: Collapse Consecutive Duplicate Lines

Implement a simplified version of the Unix uniq command. Given the full contents of a text file as a list of lines, return a new list where each run of consecutive duplicate lines is collapsed into a single line. Duplicate lines that are not consecutive must remain in the output.

Constraints

  • 0 <= len(lines) <= 200000
  • Each line is a string and may be empty.
  • The input for this part is assumed to fit in memory.
  • Only adjacent equal lines should be collapsed; do not remove all duplicates globally.

Examples

Input: ['apple', 'apple', 'banana', 'apple', 'apple', 'carrot', 'carrot']

Expected Output: ['apple', 'banana', 'apple', 'carrot']

Explanation: The first two apples collapse to one, the later two apples collapse to one, and the carrots collapse to one.

Input: []

Expected Output: []

Explanation: An empty file produces an empty output.

Input: ['only']

Expected Output: ['only']

Explanation: A single line is already unique with respect to consecutive duplicates.

Input: ['', '', 'a', '', '']

Expected Output: ['', 'a', '']

Explanation: Empty lines are valid lines and consecutive empty lines should also be collapsed.

Input: ['x', 'y', 'x', 'y']

Expected Output: ['x', 'y', 'x', 'y']

Explanation: The repeated values are not consecutive, so nothing is removed.

Hints

  1. You only need to remember the previous line, not every line that has appeared.
  2. Be careful not to remove non-consecutive duplicates.

Part 2: Streaming Consecutive Line Deduplication

Implement consecutive duplicate line removal in a streaming style. The input may represent a file that is too large to load fully into memory, so your algorithm must process lines one at a time from an iterable. For testing, the function returns the produced lines as a list, but the algorithm should not require random access to the input.

Constraints

  • The input iterable may contain 0 to 1000000 lines.
  • The iterable may be single-pass and should not be rewound.
  • Each line is compared exactly as provided, including any trailing newline characters.
  • The returned output is assumed to fit in memory for the tests; in a real command it would be written to an output stream.

Examples

Input: (['apple\n', 'apple\n', 'banana\n', 'banana\n', 'apple\n'],)

Expected Output: ['apple\n', 'banana\n', 'apple\n']

Explanation: The second apple and second banana are consecutive duplicates and are removed. The final apple is kept because it is not consecutive with the first apple.

Input: (['a\n', 'b\n', 'c\n'],)

Expected Output: ['a\n', 'b\n', 'c\n']

Explanation: There are no consecutive duplicate lines, so all lines are returned.

Input: ('only line\n',)

Expected Output: ['only line\n']

Explanation: A single string input is treated as one complete line, not as an iterable of characters.

Input: ([],)

Expected Output: []

Explanation: An empty input produces an empty output.

Input: (['\n', '\n', 'text\n', '\n', '\n'],)

Expected Output: ['\n', 'text\n', '\n']

Explanation: Blank lines are valid lines. Consecutive duplicate blank lines are collapsed, but the blank line after text is kept.

Input: (['same', 'same\n', 'same\n', 'same '],)

Expected Output: ['same', 'same\n', 'same ']

Explanation: Comparison is exact: missing newline and trailing space make lines different. Only the repeated 'same\n' is removed.

Input: (['x\n', 'x\n', 'x\n'],)

Expected Output: ['x\n']

Explanation: A run of identical consecutive lines is reduced to its first line.

Hints

  1. Do not convert the iterable to a list before processing it.
  2. Keep only the previous line and emit the current line when it differs.

Part 3: Memory-Limited Uniq for Very Long Chunked Lines

Implement consecutive duplicate removal when individual lines may be very long and should not be concatenated into a single string. Each logical line is given as a list of string chunks. Two lines are equal if the concatenation of their chunks is equal, regardless of how the chunks are split. Return the first representation in each consecutive run. To model a memory-limited implementation, compute a fingerprint of each line chunk by chunk instead of joining all chunks. For this problem, assume SHA-256 plus byte length is collision-free for the provided tests.

Constraints

  • 0 <= number of lines <= 200000
  • Each line contains 0 or more string chunks.
  • The logical line content is the concatenation of its chunks.
  • Do not concatenate all chunks of a line into one large string.
  • SHA-256 collisions are ignored for this coding problem.

Examples

Input: [['app', 'le'], ['apple'], ['ban', 'ana'], ['ban', 'ana'], ['app', 'le']]

Expected Output: [['app', 'le'], ['ban', 'ana'], ['app', 'le']]

Explanation: The first two logical lines are both apple, so only the first representation is kept.

Input: []

Expected Output: []

Explanation: No chunked lines means no output.

Input: [[], [], ['x']]

Expected Output: [[], ['x']]

Explanation: An empty chunk list represents an empty line, and consecutive empty lines collapse.

Input: [['ab', 'c'], ['a', 'bc'], ['a', 'b', 'd'], ['abd'], ['ab', 'c']]

Expected Output: [['ab', 'c'], ['a', 'b', 'd'], ['ab', 'c']]

Explanation: abc collapses with abc, abd collapses with abd, and the final abc remains because it is not consecutive with the first group.

Input: [[''], ['', ''], ['z'], ['', 'z']]

Expected Output: [[''], ['z']]

Explanation: Empty chunks do not change the logical content, so the first two lines are both empty and the last two are both z.

Hints

  1. Chunk boundaries should not affect equality: ['ab', 'c'] and ['a', 'bc'] represent the same line.
  2. Update a hash object with each chunk and compare the final digest and total byte length with the previous line.

Part 4: Count Consecutive Duplicate Lines

Extend the simplified uniq command to support counting, similar to uniq -c. Given a list of lines, group only consecutive equal lines. For each consecutive run, return the run length together with the line value.

Constraints

  • 0 <= len(lines) <= 200000
  • Each line is a string and may be empty.
  • Only consecutive equal lines belong to the same count group.
  • Counts fit in a standard integer.

Examples

Input: ['apple', 'apple', 'banana', 'apple', 'apple', 'carrot', 'carrot']

Expected Output: [[2, 'apple'], [1, 'banana'], [2, 'apple'], [2, 'carrot']]

Explanation: Each consecutive run is returned with its count.

Input: []

Expected Output: []

Explanation: An empty input has no groups.

Input: ['single']

Expected Output: [[1, 'single']]

Explanation: A single line forms one group with count 1.

Input: ['x', 'x', 'x', 'x']

Expected Output: [[4, 'x']]

Explanation: All lines are the same and consecutive, so there is one group.

Input: ['', '', 'a', '', 'b', 'b']

Expected Output: [[2, ''], [1, 'a'], [1, ''], [2, 'b']]

Explanation: Empty lines are counted like any other line, and the later empty line is a separate group.

Hints

  1. This is run-length encoding over lines.
  2. When the current line changes, append the previous line's count and reset the counter.
Last updated: Jun 13, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a uniq-like function - Vanta (medium)
  • Design test-run logging and query functions - Vanta (Medium)
  • Implement test failure analytics APIs - Vanta (Medium)
  • Design test logging and failure-window queries - Vanta (Medium)
  • Design streaming test logger and queries - Vanta (Medium)