PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Dropbox

Scale a file-crawling API using async jobs

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design scalable, fault-tolerant systems for long-running file-crawl operations, including async job orchestration, pagination/streaming, concurrency control, caching/deduplication, failure handling, and storage-abstraction considerations.

  • medium
  • Dropbox
  • System Design
  • Software Engineer

Scale a file-crawling API using async jobs

Company: Dropbox

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

You implemented a synchronous API that lists all files under a directory (recursive crawl). Now you need to scale it. ## Part A — Scaling discussion (no code required) How would you scale the “list all files under a path” API when: - The directory tree can be huge (millions/billions of files) - Crawls can take minutes+ and may time out - Many users may request crawls concurrently - The underlying storage could be local disks or network storage (e.g., NFS / object-store-like abstraction) Discuss: - API shape (sync vs async) - Pagination/streaming of results - Concurrency limits and backpressure - Caching and deduplication - Failure handling and partial results ## Part B — Async job implementation (with provided templates) Based on your design in Part A, implement (or describe at a detailed interface level) an **async crawl job**: - Client submits `rootPath` and gets back a `jobId`. - Clients can poll job status and fetch results. - The job should be robust to failures, support retries, and avoid repeating work excessively. Specify key components/APIs, e.g.: - `POST /crawl-jobs` → `{ jobId }` - `GET /crawl-jobs/{jobId}` → status/progress - `GET /crawl-jobs/{jobId}/results?cursor=...` → paged results State any assumptions (e.g., maximum result size, retention window, storage type).

Quick Answer: This question evaluates a candidate's ability to design scalable, fault-tolerant systems for long-running file-crawl operations, including async job orchestration, pagination/streaming, concurrency control, caching/deduplication, failure handling, and storage-abstraction considerations.

Related Interview Questions

  • Design a recursive distributed file crawler - Dropbox (medium)
  • Design file-processing API with long-running jobs - Dropbox (medium)
  • Design an S3-like Object Store - Dropbox (medium)
Dropbox logo
Dropbox
Jan 22, 2026, 12:00 AM
Software Engineer
Technical Screen
System Design
8
0
Loading...

You implemented a synchronous API that lists all files under a directory (recursive crawl). Now you need to scale it.

Part A — Scaling discussion (no code required)

How would you scale the “list all files under a path” API when:

  • The directory tree can be huge (millions/billions of files)
  • Crawls can take minutes+ and may time out
  • Many users may request crawls concurrently
  • The underlying storage could be local disks or network storage (e.g., NFS / object-store-like abstraction)

Discuss:

  • API shape (sync vs async)
  • Pagination/streaming of results
  • Concurrency limits and backpressure
  • Caching and deduplication
  • Failure handling and partial results

Part B — Async job implementation (with provided templates)

Based on your design in Part A, implement (or describe at a detailed interface level) an async crawl job:

  • Client submits rootPath and gets back a jobId .
  • Clients can poll job status and fetch results.
  • The job should be robust to failures, support retries, and avoid repeating work excessively.

Specify key components/APIs, e.g.:

  • POST /crawl-jobs → { jobId }
  • GET /crawl-jobs/{jobId} → status/progress
  • GET /crawl-jobs/{jobId}/results?cursor=... → paged results

State any assumptions (e.g., maximum result size, retention window, storage type).

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Dropbox•More Software Engineer•Dropbox Software Engineer•Dropbox System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.