Implement a same-host web crawler

Q: Implement a same-host web crawler

This is a Coding & Algorithms interview question from HubSpot for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Implement a web crawler that, given a starting URL and an interface get_links(url) -> Iterable[str], discovers all pages under the same hostname. Requirements: visit each URL at most once, avoid cycles, and support a fixed-size worker pool for concurrent fetching. Return the set of discovered URLs. Discuss the data structures, how you ensure thread safety, and how you would test it.

Implement a same-host web crawler

Comments (0)