PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Anthropic

Design a concurrent web crawler

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's skills in concurrent system design, including concurrency primitives, synchronization, URL frontier management, per-host politeness and rate-limiting, fault tolerance, and trade-off analysis for a web crawler.

  • hard
  • Anthropic
  • System Design
  • Software Engineer

Design a concurrent web crawler

Company: Anthropic

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design and implement a web crawler. First, build a single‑threaded version that, given a set of seed URLs, fetches pages, extracts links, deduplicates visits, respects robots.txt, and stops at a configurable depth and/or domain scope. Next, extend it to a concurrent version and compare three approaches: (a) manual multithreading using queues and locks, (b) a fixed‑size thread pool, and (c) an asyncio/event‑loop model. For each approach, explain and implement how you: maintain a URL frontier, enforce per‑host politeness and rate limits, avoid duplicate fetches, handle failures/retries and backoff, manage back‑pressure, and perform graceful shutdown. Discuss data structures (e.g., visited sets, frontier queues, per‑host buckets), synchronization primitives, and mechanisms to prevent deadlocks/starvation. Analyze time/space complexity and evaluate correctness, liveness, and performance trade‑offs across the three concurrency models.

Quick Answer: This question evaluates a candidate's skills in concurrent system design, including concurrency primitives, synchronization, URL frontier management, per-host politeness and rate-limiting, fault tolerance, and trade-off analysis for a web crawler.

Related Interview Questions

  • Design a One-on-One Chat Service - Anthropic (medium)
  • Design a prompt playground - Anthropic (hard)
  • Scale Duplicate File Detection - Anthropic (medium)
  • Design a one-to-one chat system - Anthropic (medium)
  • Design One-to-One Chat - Anthropic (medium)
Anthropic logo
Anthropic
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
19
0

System Design: Web Crawler (Single-threaded and Concurrent)

Context

Design and implement a web crawler that starts from a set of seed URLs and explores the web while respecting operational constraints (robots.txt, scope, depth). Then extend the design to three concurrent implementations and compare them.

Assume you can use Python and reasonable open-source libraries for HTTP and HTML parsing. Focus on correctness, liveness, and performance trade-offs. Show code-level designs (pseudo-code or real code skeletons) and justify data structures and synchronization choices.

Requirements

  1. Single-threaded crawler
    • Inputs: seed URLs, max depth, optional allowed domain scope.
    • Fetch pages, extract links, deduplicate visits, respect robots.txt, stop by depth and/or scope.
    • Handle failures/retries with backoff.
  2. Concurrent crawler: compare three approaches a) Manual multithreading using queues and locks b) Fixed-size thread pool c) Asyncio/event-loop model For each approach, explain and implement how you:
    • Maintain a URL frontier
    • Enforce per-host politeness and rate limits
    • Avoid duplicate fetches
    • Handle failures/retries and backoff
    • Manage back-pressure
    • Perform graceful shutdown
  3. Discuss
    • Data structures (visited sets, frontier queues, per-host buckets)
    • Synchronization primitives and how you prevent deadlocks/starvation
    • Time/space complexity
    • Correctness, liveness, and performance trade-offs across the three models

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.