This question evaluates a candidate's skills in concurrent system design, including concurrency primitives, synchronization, URL frontier management, per-host politeness and rate-limiting, fault tolerance, and trade-off analysis for a web crawler.

Design and implement a web crawler that starts from a set of seed URLs and explores the web while respecting operational constraints (robots.txt, scope, depth). Then extend the design to three concurrent implementations and compare them.
Assume you can use Python and reasonable open-source libraries for HTTP and HTML parsing. Focus on correctness, liveness, and performance trade-offs. Show code-level designs (pseudo-code or real code skeletons) and justify data structures and synchronization choices.
Login required