How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Netflix.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Netflix during technical interviews.

Implement topological sort from string input

Quick Overview

This question evaluates a candidate's skills in parsing unstructured dependency strings into directed graphs, producing a valid topological order, detecting and reporting minimal cycles, and reasoning about robustness against duplicate, unknown, or malformed inputs as well as time and space complexity.

Implement topological sort from string input

Company: Netflix

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

You receive N lines of raw text describing task dependencies, e.g., 'install A after B', 'C->A', 'B before D'. Implement a function that: ( 1) parses the strings into a directed graph; ( 2) returns one valid topological order of tasks; ( 3) detects cycles and reports a minimal cycle; ( 4) handles duplicate/unknown tasks and malformed lines gracefully. Analyze time and space complexity. How would you adapt it for streaming input and very large graphs (e.g., external memory or batched Kahn’s algorithm)?

Quick Answer: This question evaluates a candidate's skills in parsing unstructured dependency strings into directed graphs, producing a valid topological order, detecting and reporting minimal cycles, and reasoning about robustness against duplicate, unknown, or malformed inputs as well as time and space complexity.

Part 1: Parse Dependency Strings into a Directed Graph

You are given a list of raw dependency strings. Each string describes one directed dependency between two tasks using one of these formats: - 'X->Y' meaning X must happen before Y - 'X before Y' meaning X must happen before Y - 'install X after Y' meaning Y must happen before X Build and return the directed graph as a dictionary mapping each discovered task to a sorted list of its outgoing neighbors. Include tasks that have no outgoing edges. Ignore blank lines and skip malformed lines.

Constraints

0 <= len(lines) <= 10^4
Total input length across all lines <= 2 * 10^5 characters
Task names contain only letters, digits, and underscores

Examples

Input: ['install A after B', 'C->A', 'B before D']

Expected Output: {'A': [], 'B': ['A', 'D'], 'C': ['A'], 'D': []}

Explanation: Three different input formats all contribute edges to the same graph.

Input: [' Task1 -> Task2 ']

Expected Output: {'Task1': ['Task2'], 'Task2': []}

Explanation: Spaces around the arrow should be ignored.

Input: []

Expected Output: {}

Explanation: Edge case: no lines means no tasks and no edges.

Input: ['A before A']

Expected Output: {'A': ['A']}

Explanation: A self-dependency is still a valid directed edge at parse time.

Hints

Normalize each line with strip() before parsing.
The phrase 'install X after Y' reverses the edge direction: Y -> X.

Part 2: Return One Valid Topological Ordering

You are given a task list and a list of directed edges (u, v), meaning task u must happen before task v. Return the lexicographically smallest valid topological ordering. If the graph contains a cycle, return None. Using the lexicographically smallest order makes the output deterministic for testing.

Constraints

0 <= number of tasks <= 2 * 10^5
0 <= number of edges <= 2 * 10^5
Task names are strings
Duplicate task names and duplicate edges may appear

Examples

Input: (['A', 'B', 'C', 'D'], [('B', 'A'), ('C', 'A'), ('B', 'D')])

Expected Output: ['B', 'C', 'A', 'D']

Explanation: Both B and C are initially available, so pick B first because it is lexicographically smaller.

Input: (['X'], [])

Expected Output: ['X']

Explanation: Edge case: a single isolated task is already a valid topological order.

Input: ([], [])

Expected Output: []

Explanation: Edge case: the empty graph has an empty topological ordering.

Input: (['A', 'B'], [('A', 'B'), ('B', 'A')])

Expected Output: None

Explanation: A cycle makes topological ordering impossible.

Input: (['A', 'B', 'C'], [('A', 'C'), ('A', 'C'), ('B', 'C')])

Expected Output: ['A', 'B', 'C']

Explanation: Duplicate edges should not change the result.

Hints

Kahn's algorithm is a natural fit here.
Use a min-heap instead of a queue if you want the lexicographically smallest valid order.

Part 3: Detect and Report a Minimal Directed Cycle

You are given a directed graph as a task list and an edge list. Return one minimal cycle. For this problem, a minimal cycle means: 1. shortest by number of edges 2. if multiple shortest cycles exist, return the lexicographically smallest rotation of those cycles Return the cycle as a list of task names with the start node repeated at the end. If the graph is acyclic, return an empty list. Example: for A -> B -> C -> A, return ['A', 'B', 'C', 'A'].

Constraints

1 <= number of tasks <= 200 for the intended shortest-cycle approach
0 <= number of edges <= 5000
Duplicate edges may appear

Examples

Input: (['A', 'B', 'C', 'D'], [('A', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'D')])

Expected Output: ['A', 'B', 'C', 'A']

Explanation: The only cycle is A -> B -> C -> A.

Input: (['A', 'B', 'C'], [('A', 'A'), ('A', 'B'), ('B', 'C'), ('C', 'A')])

Expected Output: ['A', 'A']

Explanation: A self-loop is a cycle of length 1, which is minimal.

Input: (['A', 'B', 'C'], [('A', 'B'), ('B', 'C')])

Expected Output: []

Explanation: This graph is acyclic.

Input: (['B', 'A', 'D', 'C'], [('A', 'B'), ('B', 'A'), ('C', 'D'), ('D', 'C')])

Expected Output: ['A', 'B', 'A']

Explanation: There are two shortest 2-cycles, but ['A', 'B', 'A'] is lexicographically smaller than ['C', 'D', 'C'].

Hints

A shortest directed cycle through a start node can be found by looking for the shortest path from that start node to one of its predecessors.
BFS is useful because all edges have equal weight.

Part 4: Robust Dependency Parsing with Duplicate, Unknown, and Malformed Input Handling

You are given a known task catalog and a list of raw dependency strings. Parse the same three supported formats: - 'X->Y' - 'X before Y' - 'install X after Y' Handle bad input gracefully: - If a line is malformed, skip it and record its 0-based line index. - If a line references an unknown task, skip the edge and record each unknown task name once. - If an edge is duplicated, ignore the duplicate and count it. - Blank lines should be ignored. Return a summary object containing the cleaned graph and the recorded issues.

Constraints

0 <= len(known_tasks) <= 10^4
0 <= len(lines) <= 10^4
Total input length <= 2 * 10^5 characters

Examples

Input: (['A', 'B', 'C', 'D'], ['install A after B', 'B before D', 'B before D', 'X->A', 'bad line', 'C->A'])

Expected Output: {'graph': {'A': [], 'B': ['A', 'D'], 'C': ['A'], 'D': []}, 'ignored_duplicates': 1, 'unknown_tasks': ['X'], 'malformed_lines': [4]}

Explanation: One duplicate edge is ignored, one unknown task is reported, and one malformed line is recorded.

Input: (['A', 'B'], [])

Expected Output: {'graph': {'A': [], 'B': []}, 'ignored_duplicates': 0, 'unknown_tasks': [], 'malformed_lines': []}

Explanation: Edge case: no lines means no errors and an empty graph over the known tasks.

Input: (['A', 'B'], ['A before', 'Y before Z', 'install A after B'])

Expected Output: {'graph': {'A': [], 'B': ['A']}, 'ignored_duplicates': 0, 'unknown_tasks': ['Y', 'Z'], 'malformed_lines': [0]}

Explanation: The first line is malformed, the second line references only unknown tasks, and the third line is valid.

Input: (['A'], ['', 'A->A', 'A->A'])

Expected Output: {'graph': {'A': ['A']}, 'ignored_duplicates': 1, 'unknown_tasks': [], 'malformed_lines': []}

Explanation: Blank lines are ignored and duplicate self-edges are counted once.

Hints

Separate parsing from validation. First detect whether a line matches one of the supported formats, then check whether both tasks are known.
A set is useful both for deduplicating edges and for collecting unique unknown task names.

Part 5: Compute Time and Space Estimates for Dependency Processing

You are not asked to run topological sort here. Instead, compute the estimated work and memory usage of a dependency-processing pipeline. You are given: - L: total number of input characters - N: number of unique tasks - M: number of unique valid dependencies - lexicographic: whether topological sort uses a min-heap for deterministic lexicographic output Use this exact model: 1. Parsing scans every character once: L operations. 2. Building adjacency and indegree data from unique edges: M operations. 3. Topological processing: - If lexicographic is False: N queue removals + M edge relaxations. - If lexicographic is True: N heap pushes + N heap pops, each costing ceil(log2(N + 1)), plus M edge relaxations. 4. Peak memory cells = N adjacency buckets + N indegree slots + M edge slots = 2N + M. Return a tuple: (time_ops, space_cells, asymptotic_time, asymptotic_space).

Constraints

0 <= L, N, M <= 10^9
lexicographic is a boolean

Examples

Input: (25, 4, 3, False)

Expected Output: (35, 11, 'O(L + N + M)', 'O(N + M)')

Explanation: Time = 25 parse + 3 build + 4 queue removals + 3 relaxations = 35.

Input: (25, 4, 3, True)

Expected Output: (55, 11, 'O(L + M + N log N)', 'O(N + M)')

Explanation: ceil(log2(5)) = 3, so heap work is 2 * 4 * 3 = 24.

Input: (0, 0, 0, False)

Expected Output: (0, 0, 'O(L + N + M)', 'O(N + M)')

Explanation: Edge case: an empty dataset has zero exact work and zero exact storage.

Input: (10, 1, 0, True)

Expected Output: (12, 2, 'O(L + M + N log N)', 'O(N + M)')

Explanation: ceil(log2(2)) = 1, so heap work is 2 operations beyond parsing.

Hints

Be careful to include both graph-building work and topological-processing work.
For heap mode, use ceil(log2(N + 1)); when N = 0, the heap cost is 0.

Part 6: Batched Topological Sort for Large Graphs

To model processing very large graphs, implement a batched version of Kahn's algorithm. You are given tasks, directed edges (u, v), and a positive batch size B. Repeatedly: 1. collect up to B currently zero-indegree tasks in lexicographic order into one batch 2. remove that whole batch from the graph 3. newly unlocked tasks become candidates for later batches Return the list of batches. If the graph contains a cycle, return None. This mirrors a large-scale setting where you might emit ready tasks in chunks rather than one by one.

Constraints

0 <= number of tasks <= 2 * 10^5
0 <= number of edges <= 2 * 10^5
1 <= batch_size
Duplicate edges may appear

Examples

Input: (['A', 'B', 'C', 'D', 'E'], [('A', 'D'), ('B', 'D'), ('C', 'E')], 2)

Expected Output: [['A', 'B'], ['C', 'D'], ['E']]

Explanation: The first batch takes the two smallest ready tasks, A and B.

Input: (['A', 'B', 'C', 'D', 'E'], [('A', 'D'), ('B', 'D'), ('C', 'E')], 1)

Expected Output: [['A'], ['B'], ['C'], ['D'], ['E']]

Explanation: With batch size 1, this becomes a deterministic one-at-a-time topological sort.

Input: ([], [], 3)

Expected Output: []

Explanation: Edge case: an empty graph produces no batches.

Input: (['A', 'B'], [('A', 'B'), ('B', 'A')], 2)

Expected Output: None

Explanation: A cycle prevents all tasks from ever becoming ready.

Hints

This is still Kahn's algorithm; the main change is that you pop several ready nodes at once.
Use a min-heap so each batch is lexicographically deterministic.

Quick Overview

Implement topological sort from string input

Company: Netflix

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Part 1: Parse Dependency Strings into a Directed Graph

Constraints

0 <= len(lines) <= 10^4
Total input length across all lines <= 2 * 10^5 characters
Task names contain only letters, digits, and underscores

Examples

Input: ['install A after B', 'C->A', 'B before D']

Expected Output: {'A': [], 'B': ['A', 'D'], 'C': ['A'], 'D': []}

Explanation: Three different input formats all contribute edges to the same graph.

Input: [' Task1 -> Task2 ']

Expected Output: {'Task1': ['Task2'], 'Task2': []}

Explanation: Spaces around the arrow should be ignored.

Input: []

Expected Output: {}

Explanation: Edge case: no lines means no tasks and no edges.

Input: ['A before A']

Expected Output: {'A': ['A']}

Explanation: A self-dependency is still a valid directed edge at parse time.

Hints

Normalize each line with strip() before parsing.
The phrase 'install X after Y' reverses the edge direction: Y -> X.

Part 2: Return One Valid Topological Ordering

Constraints

0 <= number of tasks <= 2 * 10^5
0 <= number of edges <= 2 * 10^5
Task names are strings
Duplicate task names and duplicate edges may appear

Examples

Input: (['A', 'B', 'C', 'D'], [('B', 'A'), ('C', 'A'), ('B', 'D')])

Expected Output: ['B', 'C', 'A', 'D']

Explanation: Both B and C are initially available, so pick B first because it is lexicographically smaller.

Input: (['X'], [])

Expected Output: ['X']

Explanation: Edge case: a single isolated task is already a valid topological order.

Input: ([], [])

Expected Output: []

Explanation: Edge case: the empty graph has an empty topological ordering.

Input: (['A', 'B'], [('A', 'B'), ('B', 'A')])

Expected Output: None

Explanation: A cycle makes topological ordering impossible.

Input: (['A', 'B', 'C'], [('A', 'C'), ('A', 'C'), ('B', 'C')])

Expected Output: ['A', 'B', 'C']

Explanation: Duplicate edges should not change the result.

Hints

Kahn's algorithm is a natural fit here.
Use a min-heap instead of a queue if you want the lexicographically smallest valid order.

Part 3: Detect and Report a Minimal Directed Cycle

Constraints

1 <= number of tasks <= 200 for the intended shortest-cycle approach
0 <= number of edges <= 5000
Duplicate edges may appear

Examples

Input: (['A', 'B', 'C', 'D'], [('A', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'D')])

Expected Output: ['A', 'B', 'C', 'A']

Explanation: The only cycle is A -> B -> C -> A.

Input: (['A', 'B', 'C'], [('A', 'A'), ('A', 'B'), ('B', 'C'), ('C', 'A')])

Expected Output: ['A', 'A']

Explanation: A self-loop is a cycle of length 1, which is minimal.

Input: (['A', 'B', 'C'], [('A', 'B'), ('B', 'C')])

Expected Output: []

Explanation: This graph is acyclic.

Input: (['B', 'A', 'D', 'C'], [('A', 'B'), ('B', 'A'), ('C', 'D'), ('D', 'C')])

Expected Output: ['A', 'B', 'A']

Explanation: There are two shortest 2-cycles, but ['A', 'B', 'A'] is lexicographically smaller than ['C', 'D', 'C'].

Hints

A shortest directed cycle through a start node can be found by looking for the shortest path from that start node to one of its predecessors.
BFS is useful because all edges have equal weight.

Part 4: Robust Dependency Parsing with Duplicate, Unknown, and Malformed Input Handling

Constraints

0 <= len(known_tasks) <= 10^4
0 <= len(lines) <= 10^4
Total input length <= 2 * 10^5 characters

Examples

Input: (['A', 'B', 'C', 'D'], ['install A after B', 'B before D', 'B before D', 'X->A', 'bad line', 'C->A'])

Expected Output: {'graph': {'A': [], 'B': ['A', 'D'], 'C': ['A'], 'D': []}, 'ignored_duplicates': 1, 'unknown_tasks': ['X'], 'malformed_lines': [4]}

Explanation: One duplicate edge is ignored, one unknown task is reported, and one malformed line is recorded.

Input: (['A', 'B'], [])

Expected Output: {'graph': {'A': [], 'B': []}, 'ignored_duplicates': 0, 'unknown_tasks': [], 'malformed_lines': []}

Explanation: Edge case: no lines means no errors and an empty graph over the known tasks.

Input: (['A', 'B'], ['A before', 'Y before Z', 'install A after B'])

Expected Output: {'graph': {'A': [], 'B': ['A']}, 'ignored_duplicates': 0, 'unknown_tasks': ['Y', 'Z'], 'malformed_lines': [0]}

Explanation: The first line is malformed, the second line references only unknown tasks, and the third line is valid.

Input: (['A'], ['', 'A->A', 'A->A'])

Expected Output: {'graph': {'A': ['A']}, 'ignored_duplicates': 1, 'unknown_tasks': [], 'malformed_lines': []}

Explanation: Blank lines are ignored and duplicate self-edges are counted once.

Hints

Separate parsing from validation. First detect whether a line matches one of the supported formats, then check whether both tasks are known.
A set is useful both for deduplicating edges and for collecting unique unknown task names.

Part 5: Compute Time and Space Estimates for Dependency Processing

Constraints

0 <= L, N, M <= 10^9
lexicographic is a boolean

Examples

Input: (25, 4, 3, False)

Expected Output: (35, 11, 'O(L + N + M)', 'O(N + M)')

Explanation: Time = 25 parse + 3 build + 4 queue removals + 3 relaxations = 35.

Input: (25, 4, 3, True)

Expected Output: (55, 11, 'O(L + M + N log N)', 'O(N + M)')

Explanation: ceil(log2(5)) = 3, so heap work is 2 * 4 * 3 = 24.

Input: (0, 0, 0, False)

Expected Output: (0, 0, 'O(L + N + M)', 'O(N + M)')

Explanation: Edge case: an empty dataset has zero exact work and zero exact storage.

Input: (10, 1, 0, True)

Expected Output: (12, 2, 'O(L + M + N log N)', 'O(N + M)')

Explanation: ceil(log2(2)) = 1, so heap work is 2 operations beyond parsing.

Hints

Be careful to include both graph-building work and topological-processing work.
For heap mode, use ceil(log2(N + 1)); when N = 0, the heap cost is 0.

Part 6: Batched Topological Sort for Large Graphs

Constraints

0 <= number of tasks <= 2 * 10^5
0 <= number of edges <= 2 * 10^5
1 <= batch_size
Duplicate edges may appear

Examples

Input: (['A', 'B', 'C', 'D', 'E'], [('A', 'D'), ('B', 'D'), ('C', 'E')], 2)

Expected Output: [['A', 'B'], ['C', 'D'], ['E']]

Explanation: The first batch takes the two smallest ready tasks, A and B.

Input: (['A', 'B', 'C', 'D', 'E'], [('A', 'D'), ('B', 'D'), ('C', 'E')], 1)

Expected Output: [['A'], ['B'], ['C'], ['D'], ['E']]

Explanation: With batch size 1, this becomes a deterministic one-at-a-time topological sort.

Input: ([], [], 3)

Expected Output: []

Explanation: Edge case: an empty graph produces no batches.

Input: (['A', 'B'], [('A', 'B'), ('B', 'A')], 2)

Expected Output: None

Explanation: A cycle prevents all tasks from ever becoming ready.

Hints

This is still Kahn's algorithm; the main change is that you pop several ready nodes at once.
Use a min-heap so each batch is lexicographically deterministic.