PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates a candidate's skills in robust input parsing, in-memory data modeling, API design, and sequence reconstruction from linked fragments, covering string handling, edge-case handling for CSV, and graph-like ordering of sequence fragments.

  • medium
  • Microsoft
  • Coding & Algorithms
  • Software Engineer

Implement SQL Table and DNA Ordering

Company: Microsoft

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Two independent coding exercises were asked. Solve both. ### Problem A: CSV-backed in-memory table Build a tiny in-memory relational table initialized from a CSV string. Requirements: - The first CSV row contains column names; subsequent rows contain values. - Implement the CSV parser yourself. It must handle comma-separated fields, empty fields, quoted fields, commas inside quoted fields, escaped quotes represented as two consecutive double quotes, and newline characters inside quoted fields. Reject malformed CSV. - Store the parsed data in memory. - Expose a simple query API such as `select(projectedColumns, predicate)`, where `predicate` is a function evaluated on each row. Results should preserve input row order unless an explicit sort is added. - Discuss how you would extend the implementation with filtering operators, ordering, joins, or indexes. ### Problem B: Order linked DNA sequence fragments You are given fragments. In the base version, each fragment has `start`, `end`, and `payload` fields. The `end` of one fragment may equal the `start` of another fragment. All fragments form one valid chain with exactly one head and one tail. Return the concatenation of the payloads in chain order. Example: ```text [ {start: 'bbb', end: 'ccc', payload: '1'}, {start: 'aaa', end: 'bbb', payload: '2'}, {start: 'ccc', end: 'ddd', payload: '3'} ] ``` The correct chain is `aaa -> bbb -> ccc -> ddd`, so the payload order is `2, 1, 3`, and the output is `213`. Follow-ups: 1. Each fragment contains two endpoint labels and a payload, but it does not tell you which endpoint is the start and which is the end. Treat each fragment as an undirected edge and reconstruct any valid chain using all fragments. 2. The input may contain multiple disconnected chains. Detect all chains or components and return the payload sequence for each one. Also explain how you would detect invalid inputs such as cycles, branches, or duplicate edges.

Quick Answer: This question evaluates a candidate's skills in robust input parsing, in-memory data modeling, API design, and sequence reconstruction from linked fragments, covering string handling, edge-case handling for CSV, and graph-like ordering of sequence fragments.

Part 1: Manual CSV-Backed In-Memory Table Query

Write a function `solution(csv_text, projected_columns, predicate)` that parses a CSV string into an in-memory table and runs a simple query. Rules: - The first CSV row contains unique column names. - Remaining rows are data rows. - You must implement CSV parsing yourself. Do not use Python's `csv` module. - Supported CSV behavior: - comma-separated fields - empty fields - quoted fields - commas inside quoted fields - escaped quotes inside quoted fields written as `""` - newline characters inside quoted fields - Reject malformed CSV, including: - unclosed quoted fields - stray characters after a closing quote before the next comma/newline - inconsistent row lengths - duplicate or empty header names - `predicate` is either `None` or a tuple `(column_name, expected_value)` meaning equality filter. - Return only the columns listed in `projected_columns`, preserving row order. - If the CSV is malformed or a requested column does not exist, return `'ERROR'`.

Constraints

  • 0 <= len(csv_text) <= 200000
  • Header names must be unique and non-empty
  • 0 <= len(projected_columns) <= number of columns
  • Cell values are treated as strings after CSV decoding
  • Using built-in CSV parsers is not allowed

Examples

Input: ('id,name,city\n1,Alice,Paris\n2,Bob,London', ['name', 'city'], None)

Expected Output: [{'name': 'Alice', 'city': 'Paris'}, {'name': 'Bob', 'city': 'London'}]

Explanation: Basic parsing and projection.

Input: ('id,quote\n1,"Hello, ""DNA"""', ['quote'], ('id', '1'))

Expected Output: [{'quote': 'Hello, "DNA"'}]

Explanation: Handles a quoted field containing both a comma and an escaped quote.

Input: ('id,notes,tag\n1,"line1\nline2",x\n2,,y', ['id', 'notes'], ('tag', 'x'))

Expected Output: [{'id': '1', 'notes': 'line1\nline2'}]

Explanation: Handles a newline inside a quoted field and an empty field in another row.

Input: ('a,b\n', ['a'], None)

Expected Output: []

Explanation: Edge case: header only, so there are no result rows.

Input: ('a,b\n1,"oops', ['a'], None)

Expected Output: 'ERROR'

Explanation: Malformed CSV because the quoted field is never closed.

Hints

  1. A small state machine is the safest way to parse CSV: start-of-field, inside-unquoted, inside-quoted, and after-closing-quote.
  2. After a quoted field closes, the next character must be a comma, a newline, or the end of the string.

Part 2: Order Directed DNA Fragments

You are given DNA fragments that form one directed chain. Each fragment is a dictionary with fields `start`, `end`, and `payload`. If one fragment's `end` equals another fragment's `start`, the first fragment must come before the second. Reconstruct the chain order and return the concatenation of payloads. If the input is empty, return an empty string. If the fragments do not form exactly one valid directed chain, return `'INVALID'`.

Constraints

  • 0 <= len(fragments) <= 200000
  • Each `start`, `end`, and `payload` is a string
  • Expected solution should be linear in the number of fragments

Examples

Input: ([{'start': 'bbb', 'end': 'ccc', 'payload': '1'}, {'start': 'aaa', 'end': 'bbb', 'payload': '2'}, {'start': 'ccc', 'end': 'ddd', 'payload': '3'}],)

Expected Output: '213'

Explanation: The chain is aaa -> bbb -> ccc -> ddd, so the payload order is 2, 1, 3.

Input: ([{'start': 'left', 'end': 'right', 'payload': 'X'}],)

Expected Output: 'X'

Explanation: Edge case: a single fragment is already the full chain.

Input: ([],)

Expected Output: ''

Explanation: Edge case: no fragments produce an empty payload string.

Input: ([{'start': 'm', 'end': 'n', 'payload': 'C'}, {'start': 'l', 'end': 'm', 'payload': 'A'}, {'start': 'n', 'end': 'o', 'payload': 'G'}],)

Expected Output: 'ACG'

Explanation: The ordered chain is l -> m -> n -> o.

Hints

  1. In a valid directed chain, the head is the only `start` value that never appears as an `end`.
  2. A hash map from `start` to fragment lets you walk the chain in O(n) time.

Part 3: Order DNA Fragments with Unlabeled Endpoints

Now each DNA fragment has two endpoint labels and a payload, but the fragment does not say which endpoint is the start and which is the end. Treat each fragment as an undirected edge. All fragments together form exactly one valid chain and every fragment must be used once. Because an undirected chain can be read from either end, there are generally two valid payload orders. Return the lexicographically smaller concatenated payload string. If the input is empty, return an empty string. If the fragments do not form a single simple chain, return `'INVALID'`.

Constraints

  • 0 <= len(fragments) <= 200000
  • Each endpoint label and payload is a string
  • A valid chain has exactly two endpoints of degree 1 and all other nodes of degree 2

Examples

Input: ([{'a': 'bbb', 'b': 'ccc', 'payload': '1'}, {'a': 'aaa', 'b': 'bbb', 'payload': '2'}, {'a': 'ddd', 'b': 'ccc', 'payload': '3'}],)

Expected Output: '213'

Explanation: The chain is aaa - bbb - ccc - ddd, giving payload orders 213 or 312. The smaller one is 213.

Input: ([{'a': 'u', 'b': 'v', 'payload': 'b'}, {'a': 'w', 'b': 'v', 'payload': 'a'}],)

Expected Output: 'ab'

Explanation: Possible orders are 'ba' and 'ab'; return the lexicographically smaller one.

Input: ([{'a': 'left', 'b': 'right', 'payload': 'Z'}],)

Expected Output: 'Z'

Explanation: Edge case: a single undirected edge gives one payload.

Input: ([],)

Expected Output: ''

Explanation: Edge case: empty input.

Hints

  1. In an undirected chain, the two endpoints are exactly the nodes with degree 1.
  2. If you recover one payload order, the other valid order is just the reverse of the payload list.

Part 4: Find All DNA Chains and Reject Invalid Inputs

You are given undirected DNA fragments. Each fragment is a dictionary with endpoint labels `a` and `b`, plus a `payload`. Fragments may belong to multiple disconnected components. A component is valid only if it forms a simple chain using all of its edges exactly once. Your task: - Find every connected component. - For each valid chain component, produce the concatenated payloads in chain order. - Since each undirected chain can be read in either direction, choose the lexicographically smaller payload string for that component. - Return the list of component strings sorted lexicographically. Return `'INVALID'` if any of the following occurs anywhere in the input: - self-loop (`a == b`) - duplicate undirected edge between the same endpoints - branch (a node with degree greater than 2) - cycle or any connected component that is not a simple path If the input list is empty, return an empty list.

Constraints

  • 0 <= len(fragments) <= 200000
  • Endpoint labels and payloads are strings
  • Duplicate means the same unordered pair of endpoints appears more than once, regardless of payload
  • Expected solution should run near linearly, aside from sorting the final component strings

Examples

Input: ([{'a': 'b', 'b': 'c', 'payload': '1'}, {'a': 'a', 'b': 'b', 'payload': '2'}, {'a': 'x', 'b': 'y', 'payload': '9'}],)

Expected Output: ['12', '9']

Explanation: There are two valid components. The path a-b-c gives payloads 21 or 12, so choose 12. The other component gives 9.

Input: ([],)

Expected Output: []

Explanation: Edge case: no fragments means no components.

Input: ([{'a': 'a', 'b': 'b', 'payload': '1'}, {'a': 'b', 'b': 'c', 'payload': '2'}, {'a': 'b', 'b': 'd', 'payload': '3'}],)

Expected Output: 'INVALID'

Explanation: Node b has degree 3, so the component branches and is not a simple chain.

Input: ([{'a': 'a', 'b': 'b', 'payload': '1'}, {'a': 'b', 'b': 'c', 'payload': '2'}, {'a': 'c', 'b': 'a', 'payload': '3'}],)

Expected Output: 'INVALID'

Explanation: This component is a cycle, not a path.

Input: ([{'a': 'a', 'b': 'b', 'payload': '1'}, {'a': 'b', 'b': 'a', 'payload': '2'}],)

Expected Output: 'INVALID'

Explanation: The same undirected edge appears twice.

Hints

  1. A connected component is a simple chain exactly when it has two degree-1 nodes, all other nodes have degree 2, and nodes = edges + 1.
  2. Normalize each edge as `(min(a, b), max(a, b))` to catch duplicates in either orientation.
Last updated: May 6, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Return Top K Open Businesses - Microsoft (hard)
  • Implement Memory Allocation and In-Memory Records - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Sort Three Categories In Place - Microsoft (medium)
  • Retain Top K Elements - Microsoft (medium)