PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Lyft

Design web crawler for 1000 devices

Last updated: Mar 29, 2026

Quick Overview

This interview question evaluates requirements, scale assumptions, API/data design, architecture, trade-offs, failure modes, and rollout in a realistic interview setting. A strong answer for Design web crawler for 1000 devices states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

  • hard
  • Lyft
  • System Design
  • Software Engineer

Design web crawler for 1000 devices

Company: Lyft

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Onsite

##### Question Design a web crawler that starts from a single link and distributes crawling across 1,000 different devices; address coordination, load balancing, fault tolerance, and scalability follow-ups

Quick Answer: This interview question evaluates requirements, scale assumptions, API/data design, architecture, trade-offs, failure modes, and rollout in a realistic interview setting. A strong answer for Design web crawler for 1000 devices states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

Related Interview Questions

  • Design a Donation Platform - Lyft (hard)
  • Design a scalable real-time chat system - Lyft (hard)
  • Design a distributed web crawler - Lyft (hard)
  • Design a scalable news feed system - Lyft (hard)
|Home/System Design/Lyft

Design web crawler for 1000 devices

Lyft logo
Lyft
Aug 4, 2025, 10:55 AM
hardSoftware EngineerOnsiteSystem Design
18
0

Design web crawler for 1000 devices

Distributed Web Crawler: Design for 1,000 Devices

Context

Design a production-ready web crawler that starts from a single seed URL and scales crawling across 1,000 heterogeneous devices. The crawler should respect robots.txt and per-host politeness constraints, deduplicate URLs/content, and persist pages and metadata.

Requirements

  • Start from one seed link and discover new URLs recursively.
  • Distribute crawling across ~1,000 devices.
  • Address:
    1. Coordination of work and state
    2. Load balancing and throttling
    3. Fault tolerance and recovery
    4. Scalability and typical follow-ups
  • Assume an internet-scale target with diverse domains and varying latency.

Deliverables

  • High-level architecture and data flow
  • How URLs are assigned, deduplicated, and scheduled
  • Policies for robots.txt, per-host rate limits, retries
  • Storage approach for frontier state and fetched content
  • Specific mechanisms for coordination, load balancing, fault tolerance, and scaling

Constraints & Assumptions

  • Preserve the scope, facts, inputs, and requested outputs from the prompt above.
  • If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
  • Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

  • Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.
  • State explicit assumptions before making sizing or architecture decisions.
  • Prioritize the functional path first, then address reliability, security, observability, and rollout.

What a Strong Answer Covers

  • A scoped requirements summary with concrete non-goals and success metrics.
  • API, data model, architecture, consistency, capacity, and operations.
  • Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.
  • A validation, monitoring, migration, and launch plan appropriate for the risk level.

Follow-up Questions

  • What breaks first at 10x traffic or data volume?
  • How would you degrade gracefully during dependency failures?
  • What metrics and alerts would prove the design is healthy after launch?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Lyft•More Software Engineer•Lyft Software Engineer•Lyft System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.