PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/System Design/Anthropic

Scale crawler with thread pool

Last updated: Apr 22, 2026

Quick Overview

This question evaluates system design and concurrent programming competencies, focusing on bounded thread-pool architecture, thread-safe frontier and deduplication structures, per-host politeness and in-flight limits, reliability mechanisms (timeouts, retries, backoff), and graceful shutdown and termination semantics.

  • hard
  • Anthropic
  • System Design
  • Software Engineer

Scale crawler with thread pool

Company: Anthropic

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Refactor the crawler to run concurrently using a bounded thread pool. Design a thread-safe URL frontier (work queue) and a thread-safe visited set to prevent duplicate fetches. Explain worker lifecycle, task acquisition, and termination conditions for the pool. Describe rate limiting per host, timeouts, retries for transient failures, and how to support cancellation and graceful shutdown. Compare approaches—coarse-grained locks, fine-grained locks, lock-free/concurrent data structures, and message-queue based designs—and analyze trade-offs in contention, throughput, fairness, and memory usage.

Quick Answer: This question evaluates system design and concurrent programming competencies, focusing on bounded thread-pool architecture, thread-safe frontier and deduplication structures, per-host politeness and in-flight limits, reliability mechanisms (timeouts, retries, backoff), and graceful shutdown and termination semantics.

Related Interview Questions

  • Design a one-to-one chat system - Anthropic (medium)
  • Design One-to-One Chat - Anthropic (medium)
  • How to stream a large file to 1000 hosts fastest - Anthropic (medium)
  • Design guardrails and fallback for LLM reliability - Anthropic (hard)
  • Design a Crash-Resilient LRU Cache - Anthropic (hard)
Anthropic logo
Anthropic
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
11
0

Concurrent Web Crawler — Bounded Thread Pool, Thread-Safe Frontier, Dedupe, Politeness, and Trade-offs

You are refactoring an existing single-threaded crawler to run concurrently. Design the system and explain key concurrency and reliability concerns.

Requirements

  1. Concurrency & Backpressure
    • Run crawling with a bounded thread pool. Explain worker lifecycle, how work is acquired, and when workers/threads terminate.
  2. Thread-Safe Data Structures
    • URL frontier (work queue): thread-safe, prevents head-of-line blocking, and supports backpressure.
    • Visited set: thread-safe deduplication with atomic test-and-set semantics.
  3. Politeness & Reliability
    • Per-host rate limiting and max in-flight requests per host.
    • Timeouts and retries (transient failures) with backoff.
  4. Control & Shutdown
    • Support cancellation and graceful shutdown.
    • Define termination/quiescence conditions when the crawl is "done."
  5. Design Alternatives & Analysis
    • Compare coarse-grained locks, fine-grained locks, lock-free/concurrent data structures, and message-queue based designs.
    • Analyze trade-offs in contention, throughput, fairness, and memory usage.

Assume a single-process, multi-threaded crawler (language/runtime of your choice). Provide concise pseudo-code and justify design choices.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.