PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Dropbox

Design a recursive distributed file crawler

Last updated: Apr 2, 2026

Quick Overview

This prompt evaluates expertise in distributed systems, asynchronous task orchestration, recursive job scheduling, data modeling for job and file metadata, and reliability concerns such as retries, idempotency, deduplication, and partial failure handling.

  • medium
  • Dropbox
  • System Design
  • Software Engineer

Design a recursive distributed file crawler

Company: Dropbox

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design a distributed service that crawls a large file system starting from a root path. A client should be able to call an API to start a crawl job, and background workers should traverse directories and files asynchronously. While processing a directory, a worker may split the work into smaller tasks and enqueue additional async jobs, so the async service can recursively trigger more of its own jobs. Discuss: - APIs for creating a crawl job and checking job status - the data model for crawl jobs, crawl tasks, and discovered file metadata - how workers recursively schedule child tasks - how to handle retries, idempotency, deduplication, and partial failures - how to scale to very large directory trees - the role of the database, message queue, and async workers - how to expose progress and final results to clients

Quick Answer: This prompt evaluates expertise in distributed systems, asynchronous task orchestration, recursive job scheduling, data modeling for job and file metadata, and reliability concerns such as retries, idempotency, deduplication, and partial failure handling.

Related Interview Questions

  • Scale a file-crawling API using async jobs - Dropbox (medium)
  • Design file-processing API with long-running jobs - Dropbox (medium)
  • Design an S3-like Object Store - Dropbox (medium)
Dropbox logo
Dropbox
Jan 25, 2026, 12:00 AM
Software Engineer
Onsite
System Design
4
0

Design a distributed service that crawls a large file system starting from a root path. A client should be able to call an API to start a crawl job, and background workers should traverse directories and files asynchronously. While processing a directory, a worker may split the work into smaller tasks and enqueue additional async jobs, so the async service can recursively trigger more of its own jobs.

Discuss:

  • APIs for creating a crawl job and checking job status
  • the data model for crawl jobs, crawl tasks, and discovered file metadata
  • how workers recursively schedule child tasks
  • how to handle retries, idempotency, deduplication, and partial failures
  • how to scale to very large directory trees
  • the role of the database, message queue, and async workers
  • how to expose progress and final results to clients

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Dropbox•More Software Engineer•Dropbox Software Engineer•Dropbox System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.