This question evaluates a candidate's ability to design scalable, fault-tolerant distributed systems for web crawling, covering competencies in concurrency, queueing and scheduling, deduplication, storage architecture, and observability.
Design a service that crawls images starting from a set of root URLs.
Requirements:
Deliverables: