PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Atlassian

Design an image crawler for unlimited URLs

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design scalable, fault-tolerant distributed systems for web crawling, covering competencies in concurrency, queueing and scheduling, deduplication, storage architecture, and observability.

  • medium
  • Atlassian
  • System Design
  • Software Engineer

Design an image crawler for unlimited URLs

Company: Atlassian

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design a service that crawls images starting from a set of root URLs. Requirements: - Input: one or more root URLs. - Crawl pages, discover links, and download image resources. - Support **unlimited number of root URLs** and **unlimited crawl depth**. - Must handle failures (network errors, timeouts, crashes) and avoid re-crawling the same URL excessively. - Discuss storage for downloaded images and metadata. Deliverables: - High-level architecture (components, data flow). - Queue/scheduler design and politeness (per-host rate limiting). - Deduplication strategy. - DB schema for crawl state and results. - Failure/retry model and monitoring.

Quick Answer: This question evaluates a candidate's ability to design scalable, fault-tolerant distributed systems for web crawling, covering competencies in concurrency, queueing and scheduling, deduplication, storage architecture, and observability.

Related Interview Questions

  • Design a distributed rate limiter service - Atlassian (medium)
  • Design a simple greeting-card web app - Atlassian (medium)
  • Design a Data Stream Processor - Atlassian (easy)
  • Design a scalable chatbot platform - Atlassian (medium)
  • Diagnose why a scaled system became slow - Atlassian (medium)
Atlassian logo
Atlassian
Jan 5, 2026, 12:00 AM
Software Engineer
Onsite
System Design
5
0

Design a service that crawls images starting from a set of root URLs.

Requirements:

  • Input: one or more root URLs.
  • Crawl pages, discover links, and download image resources.
  • Support unlimited number of root URLs and unlimited crawl depth .
  • Must handle failures (network errors, timeouts, crashes) and avoid re-crawling the same URL excessively.
  • Discuss storage for downloaded images and metadata.

Deliverables:

  • High-level architecture (components, data flow).
  • Queue/scheduler design and politeness (per-host rate limiting).
  • Deduplication strategy.
  • DB schema for crawl state and results.
  • Failure/retry model and monitoring.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Atlassian•More Software Engineer•Atlassian Software Engineer•Atlassian System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.