Implement SQL Table and DNA Ordering
Company: Microsoft
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's skills in robust input parsing, in-memory data modeling, API design, and sequence reconstruction from linked fragments, covering string handling, edge-case handling for CSV, and graph-like ordering of sequence fragments.
Part 1: Manual CSV-Backed In-Memory Table Query
Constraints
- 0 <= len(csv_text) <= 200000
- Header names must be unique and non-empty
- 0 <= len(projected_columns) <= number of columns
- Cell values are treated as strings after CSV decoding
- Using built-in CSV parsers is not allowed
Examples
Input: ('id,name,city\n1,Alice,Paris\n2,Bob,London', ['name', 'city'], None)
Expected Output: [{'name': 'Alice', 'city': 'Paris'}, {'name': 'Bob', 'city': 'London'}]
Explanation: Basic parsing and projection.
Input: ('id,quote\n1,"Hello, ""DNA"""', ['quote'], ('id', '1'))
Expected Output: [{'quote': 'Hello, "DNA"'}]
Explanation: Handles a quoted field containing both a comma and an escaped quote.
Input: ('id,notes,tag\n1,"line1\nline2",x\n2,,y', ['id', 'notes'], ('tag', 'x'))
Expected Output: [{'id': '1', 'notes': 'line1\nline2'}]
Explanation: Handles a newline inside a quoted field and an empty field in another row.
Input: ('a,b\n', ['a'], None)
Expected Output: []
Explanation: Edge case: header only, so there are no result rows.
Input: ('a,b\n1,"oops', ['a'], None)
Expected Output: 'ERROR'
Explanation: Malformed CSV because the quoted field is never closed.
Hints
- A small state machine is the safest way to parse CSV: start-of-field, inside-unquoted, inside-quoted, and after-closing-quote.
- After a quoted field closes, the next character must be a comma, a newline, or the end of the string.
Part 2: Order Directed DNA Fragments
Constraints
- 0 <= len(fragments) <= 200000
- Each `start`, `end`, and `payload` is a string
- Expected solution should be linear in the number of fragments
Examples
Input: ([{'start': 'bbb', 'end': 'ccc', 'payload': '1'}, {'start': 'aaa', 'end': 'bbb', 'payload': '2'}, {'start': 'ccc', 'end': 'ddd', 'payload': '3'}],)
Expected Output: '213'
Explanation: The chain is aaa -> bbb -> ccc -> ddd, so the payload order is 2, 1, 3.
Input: ([{'start': 'left', 'end': 'right', 'payload': 'X'}],)
Expected Output: 'X'
Explanation: Edge case: a single fragment is already the full chain.
Input: ([],)
Expected Output: ''
Explanation: Edge case: no fragments produce an empty payload string.
Input: ([{'start': 'm', 'end': 'n', 'payload': 'C'}, {'start': 'l', 'end': 'm', 'payload': 'A'}, {'start': 'n', 'end': 'o', 'payload': 'G'}],)
Expected Output: 'ACG'
Explanation: The ordered chain is l -> m -> n -> o.
Hints
- In a valid directed chain, the head is the only `start` value that never appears as an `end`.
- A hash map from `start` to fragment lets you walk the chain in O(n) time.
Part 3: Order DNA Fragments with Unlabeled Endpoints
Constraints
- 0 <= len(fragments) <= 200000
- Each endpoint label and payload is a string
- A valid chain has exactly two endpoints of degree 1 and all other nodes of degree 2
Examples
Input: ([{'a': 'bbb', 'b': 'ccc', 'payload': '1'}, {'a': 'aaa', 'b': 'bbb', 'payload': '2'}, {'a': 'ddd', 'b': 'ccc', 'payload': '3'}],)
Expected Output: '213'
Explanation: The chain is aaa - bbb - ccc - ddd, giving payload orders 213 or 312. The smaller one is 213.
Input: ([{'a': 'u', 'b': 'v', 'payload': 'b'}, {'a': 'w', 'b': 'v', 'payload': 'a'}],)
Expected Output: 'ab'
Explanation: Possible orders are 'ba' and 'ab'; return the lexicographically smaller one.
Input: ([{'a': 'left', 'b': 'right', 'payload': 'Z'}],)
Expected Output: 'Z'
Explanation: Edge case: a single undirected edge gives one payload.
Input: ([],)
Expected Output: ''
Explanation: Edge case: empty input.
Hints
- In an undirected chain, the two endpoints are exactly the nodes with degree 1.
- If you recover one payload order, the other valid order is just the reverse of the payload list.
Part 4: Find All DNA Chains and Reject Invalid Inputs
Constraints
- 0 <= len(fragments) <= 200000
- Endpoint labels and payloads are strings
- Duplicate means the same unordered pair of endpoints appears more than once, regardless of payload
- Expected solution should run near linearly, aside from sorting the final component strings
Examples
Input: ([{'a': 'b', 'b': 'c', 'payload': '1'}, {'a': 'a', 'b': 'b', 'payload': '2'}, {'a': 'x', 'b': 'y', 'payload': '9'}],)
Expected Output: ['12', '9']
Explanation: There are two valid components. The path a-b-c gives payloads 21 or 12, so choose 12. The other component gives 9.
Input: ([],)
Expected Output: []
Explanation: Edge case: no fragments means no components.
Input: ([{'a': 'a', 'b': 'b', 'payload': '1'}, {'a': 'b', 'b': 'c', 'payload': '2'}, {'a': 'b', 'b': 'd', 'payload': '3'}],)
Expected Output: 'INVALID'
Explanation: Node b has degree 3, so the component branches and is not a simple chain.
Input: ([{'a': 'a', 'b': 'b', 'payload': '1'}, {'a': 'b', 'b': 'c', 'payload': '2'}, {'a': 'c', 'b': 'a', 'payload': '3'}],)
Expected Output: 'INVALID'
Explanation: This component is a cycle, not a path.
Input: ([{'a': 'a', 'b': 'b', 'payload': '1'}, {'a': 'b', 'b': 'a', 'payload': '2'}],)
Expected Output: 'INVALID'
Explanation: The same undirected edge appears twice.
Hints
- A connected component is a simple chain exactly when it has two degree-1 nodes, all other nodes have degree 2, and nodes = edges + 1.
- Normalize each edge as `(min(a, b), max(a, b))` to catch duplicates in either orientation.