Implement a same-host web crawler

Q: Implement a same-host web crawler

This question evaluates a candidate's competency in graph traversal and state management, URL parsing and validation, duplicate detection and cycle handling, and concurrent coordination with per-host rate limiting.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Implement a web crawler. Given a start URL and a function get_links(url) -> list of URLs, return all URLs that share the same hostname as the start. Avoid revisiting pages, handle cycles and duplicate links, and be robust to malformed URLs. Describe time/space complexity and provide code. As a follow-up, explain how you would extend it to run concurrently with multiple workers while preserving correctness (no duplicate fetches) and respecting a per-host rate limit.

Implement a same-host web crawler

Overview

Comments (0)