Find first error and propagate failures
Company: Snowflake
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency with algorithmic invariants and search techniques, specifically binary search on ordered logs, graph traversal (BFS/DFS) for reachability and longest-path reasoning, cycle handling, and formal analysis of correctness and time/space complexity within the Coding & Algorithms domain.
Constraints
- 2 <= len(logs) <= 200000
- Each log entry starts with exactly one of: "[Info]", "[Warn]", "[Error]"
- Monotonic logs: once the first "[Error]" appears, all subsequent entries are "[Error]"
- The entry immediately before the first "[Error]" is "[Warn]" (thus, the first error index >= 1) unless no error exists
- 0 <= number of services <= 100000; 0 <= number of edges <= 200000
- Edges (u, v) mean u depends on v; failure propagates from v to u
- Service names are non-empty ASCII strings and uniquely identify services
- For part (3), the call graph is a DAG (thus the reverse graph is also acyclic)
- Return the failing services as a lexicographically sorted list; for longest chains, break ties by lexicographically smallest sequence
Solution
from collections import defaultdict, deque
from functools import lru_cache
import sys
def analyze_incidents(logs: list[str], edges: list[tuple[str, str]], s: str) -> tuple[int, list[str], list[str]]:
# Task 1: First [Error] index via binary search (lower_bound of predicate is_error)
n = len(logs)
lo, hi = 0, n # [lo, hi)
def is_error(i: int) -> bool:
return logs[i].startswith('[Error]')
while lo < hi:
mid = (lo + hi) // 2
if is_error(mid):
hi = mid
else:
lo = mid + 1
first_error_idx = lo if lo < n and is_error(lo) else -1
# Task 2: Build reverse graph and collect all failing services via BFS from s
rev = defaultdict(set)
for u, v in edges:
rev[v].add(u) # reverse edge: failure propagates from v to u
failing = set()
dq = deque([s])
while dq:
u = dq.popleft()
if u in failing:
continue
failing.add(u)
for w in rev.get(u, ()): # callers of u
if w not in failing:
dq.append(w)
failing_sorted = sorted(failing)
# Task 3: Longest chain from s in reverse graph (assume DAG). Use DP for length and next-pointer.
# Restrict adjacency to reachable (failing) nodes for efficiency/determinism.
adj = {u: sorted([v for v in rev.get(u, ()) if v in failing]) for u in failing}
# Recursion depth can be as large as longest chain; raise limit conservatively.
sys.setrecursionlimit(max(1000000, len(failing) * 2 + 10))
@lru_cache(None)
def dp(u: str) -> tuple[int, str | None]:
# Returns (best_length_from_u_inclusive, best_next_node)
best_len = 1
best_next = None
for v in adj.get(u, []):
l_v, _ = dp(v)
cand_len = 1 + l_v
if cand_len > best_len:
best_len = cand_len
best_next = v
elif cand_len == best_len:
if best_next is None or v < best_next:
best_next = v
return best_len, best_next
# Reconstruct chain from s using next-pointers; guard against unexpected cycles by halting on repeats.
chain = []
seen = set()
cur = s
while cur is not None and cur not in seen:
chain.append(cur)
seen.add(cur)
_, nxt = dp(cur)
cur = nxt
return first_error_idx, failing_sorted, chain
Explanation
Time complexity: O(log n + V + E). Space complexity: O(V + E).
Hints
- For the first error index, the predicate "logs[i] starts with '[Error]'" is monotone; use binary search to find the lowest i with predicate true.
- Build the reverse graph (for edge u->v, add v->u) and run BFS/DFS from s to find all upstream callers that fail.
- For the longest chain on a DAG, use DFS with memoization to compute the longest path length from each node; to break ties, pick the next node with the smallest name among those achieving the same length, then reconstruct the path from s.