Implement DFS with cycle detection and topo order
Company: Snowflake
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates mastery of graph algorithms—specifically depth-first search (both recursive and iterative), directed cycle detection, topological ordering, and discovery/finish timestamping—and falls under the Coding & Algorithms domain.
Constraints
- 1 <= N <= 100000
- 0 <= M <= 2*10^5
- 1 <= u, v <= N for every edge (u, v)
- Self-loops (u == v) and parallel edges are allowed
- The graph may be disconnected
- The iterative DFS must not rely on raising Python's recursion limit
Examples
Input: (4, [(1, 2), (2, 3), (3, 4)])
Expected Output: {'is_dag': True, 'topo_order': [1, 2, 3, 4], 'cycle': None, 'discover': {1: 1, 2: 2, 3: 3, 4: 4}, 'finish': {1: 8, 2: 7, 3: 6, 4: 5}}
Explanation: Simple DAG chain 1->2->3->4. Valid topo order [1,2,3,4]; no cycle. CLRS timestamps increment on each discover and finish.
Input: (4, [(1, 2), (2, 3), (3, 2), (3, 4)])
Expected Output: {'is_dag': False, 'topo_order': None, 'cycle': [2, 3, 2], 'discover': {1: 1, 2: 2, 3: 3, 4: 4}, 'finish': {1: 8, 2: 7, 3: 6, 4: 5}}
Explanation: Edge 3->2 is a back edge (2 is GRAY ancestor of 3), so a directed cycle [2,3,2] is reported and topo_order is None.
Input: (3, [(2, 2)])
Expected Output: {'is_dag': False, 'topo_order': None, 'cycle': [2, 2], 'discover': {1: 1, 2: 3, 3: 5}, 'finish': {1: 2, 2: 4, 3: 6}}
Explanation: Self-loop on node 2: when scanning 2's neighbors, 2 is still GRAY, so the cycle is reported as [2,2].
Input: (3, [(1, 2), (1, 2), (2, 3)])
Expected Output: {'is_dag': True, 'topo_order': [1, 2, 3], 'cycle': None, 'discover': {1: 1, 2: 2, 3: 3}, 'finish': {1: 6, 2: 5, 3: 4}}
Explanation: Parallel edges (1->2 twice) are processed but the second is ignored once 2 is no longer WHITE; the graph is still a DAG.
Input: (5, [(1, 2), (3, 4)])
Expected Output: {'is_dag': True, 'topo_order': [5, 3, 4, 1, 2], 'cycle': None, 'discover': {1: 1, 2: 2, 3: 5, 4: 6, 5: 9}, 'finish': {1: 4, 2: 3, 3: 8, 4: 7, 5: 10}}
Explanation: Disconnected graph: components {1,2}, {3,4}, and isolated node 5. DFS starts from each WHITE root in ascending order; timestamps continue across components.
Input: (1, [])
Expected Output: {'is_dag': True, 'topo_order': [1], 'cycle': None, 'discover': {1: 1}, 'finish': {1: 2}}
Explanation: Single node, no edges. Trivial DAG: discover=1, finish=2, topo_order=[1].
Input: (2, [(1, 2), (2, 1)])
Expected Output: {'is_dag': False, 'topo_order': None, 'cycle': [1, 2, 1], 'discover': {1: 1, 2: 2}, 'finish': {1: 4, 2: 3}}
Explanation: Two-node cycle 1->2->1. Edge 2->1 hits GRAY ancestor 1, yielding cycle [1,2,1].
Hints
- Use the white/gray/black coloring scheme: WHITE=undiscovered, GRAY=on the current DFS stack, BLACK=finished. An edge to a GRAY node is a back edge => a directed cycle.
- Run an explicit-stack iterative DFS where each stack frame stores (node, next_neighbor_index) so you can pause and resume the neighbor scan without recursion.
- Keep one global clock; increment it on discovery and again on finish. The topological order (for a DAG) is the reverse of the order in which nodes finish.
- Track a parent pointer when you first discover a node. On a back edge u->v, walk parents from u up to v and append v to reconstruct the cycle; a self-loop is the special case u == v giving [u, u].