PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Dropbox

Design file-processing API with long-running jobs

Last updated: May 8, 2026

Quick Overview

This question evaluates a candidate's ability to design HTTP APIs and scalable backend architectures for long-running file-processing jobs, covering competencies in API design, job orchestration, distributed systems, fault tolerance, state persistence, and idempotency.

  • medium
  • Dropbox
  • System Design
  • Software Engineer

Design file-processing API with long-running jobs

Company: Dropbox

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

You are asked to design an HTTP-based API and high-level backend architecture for processing files in a cloud storage service (similar to a simplified Dropbox). The system needs to perform a CPU-intensive operation on files (for example, virus scanning, OCR, or thumbnail generation). ## Part 1: Simple synchronous processing API Start with a basic requirement: - The client provides a **list of file identifiers** (e.g., file IDs or paths owned by the user). - The backend must process each of these files and return the **result for all files** in a single response. - Initially, you may assume that processing each file is relatively fast (e.g., under a second each) and the total request can reasonably complete within typical HTTP timeouts. Design: - A clear REST-style API: request and response formats for a synchronous endpoint like `POST /processFiles`. - High-level backend components and data flow for handling this request (no need for very low-level implementation details). Address in your design: - How the client specifies which files to process. - What information the response returns (per-file success/failure, output data, errors). - Basic error handling (e.g., if some files fail). ## Part 2: Handling long-running requests (follow-up) Now assume that **processing each file can take a long time**, ranging from several seconds to several minutes, and the list may contain many files. A fully synchronous HTTP request will often: - Exceed frontend or load balancer timeouts. - Provide poor user experience if the client must wait for a long open connection. Extend your design to handle long-running processing robustly. New requirements: - The client can still submit a list of files to be processed. - The request should return quickly (within a few seconds) even if total processing will take minutes. - The client must be able to: - Track the **status** of the processing job (e.g., pending, in-progress, completed, failed). - Obtain **per-file results** once processing is complete. - The system should handle: - Server restarts and crashes. - Retries from clients (idempotency concerns). - Scaling to many concurrent jobs. ### What to cover in your answer Describe a high-level design that includes: 1. **API surface** - Endpoints for: - Submitting a processing job for a list of files. - Checking job status. - Retrieving results (and whether status and results are combined or separate). - Request and response shapes at a high level. 2. **Architecture and components** - How you will store jobs and their state (e.g., database schema at a conceptual level). - How you will perform the actual file processing (e.g., background workers, queues). - How work is distributed and scaled across multiple machines. 3. **Long-running job handling** - How you avoid tying up HTTP connections for the duration of processing. - How the client can safely retry requests without creating duplicate jobs. - How to handle partial failures (some files succeed, others fail). 4. **Reliability and scalability considerations** - Handling failures and restarts: ensuring jobs are not lost and are eventually completed or marked failed. - Idempotency and deduplication strategies. - Basic performance considerations and bottlenecks. You do **not** need to write actual code, but explain your design clearly enough that an experienced engineer could implement it from your description.

Quick Answer: This question evaluates a candidate's ability to design HTTP APIs and scalable backend architectures for long-running file-processing jobs, covering competencies in API design, job orchestration, distributed systems, fault tolerance, state persistence, and idempotency.

Related Interview Questions

  • Design a recursive distributed file crawler - Dropbox (medium)
  • Scale a file-crawling API using async jobs - Dropbox (medium)
  • Design an S3-like Object Store - Dropbox (medium)
Dropbox logo
Dropbox
Nov 1, 2025, 12:00 AM
Software Engineer
Onsite
System Design
5
0

You are asked to design an HTTP-based API and high-level backend architecture for processing files in a cloud storage service (similar to a simplified Dropbox).

The system needs to perform a CPU-intensive operation on files (for example, virus scanning, OCR, or thumbnail generation).

Part 1: Simple synchronous processing API

Start with a basic requirement:

  • The client provides a list of file identifiers (e.g., file IDs or paths owned by the user).
  • The backend must process each of these files and return the result for all files in a single response.
  • Initially, you may assume that processing each file is relatively fast (e.g., under a second each) and the total request can reasonably complete within typical HTTP timeouts.

Design:

  • A clear REST-style API: request and response formats for a synchronous endpoint like POST /processFiles .
  • High-level backend components and data flow for handling this request (no need for very low-level implementation details).

Address in your design:

  • How the client specifies which files to process.
  • What information the response returns (per-file success/failure, output data, errors).
  • Basic error handling (e.g., if some files fail).

Part 2: Handling long-running requests (follow-up)

Now assume that processing each file can take a long time, ranging from several seconds to several minutes, and the list may contain many files. A fully synchronous HTTP request will often:

  • Exceed frontend or load balancer timeouts.
  • Provide poor user experience if the client must wait for a long open connection.

Extend your design to handle long-running processing robustly.

New requirements:

  • The client can still submit a list of files to be processed.
  • The request should return quickly (within a few seconds) even if total processing will take minutes.
  • The client must be able to:
    • Track the status of the processing job (e.g., pending, in-progress, completed, failed).
    • Obtain per-file results once processing is complete.
  • The system should handle:
    • Server restarts and crashes.
    • Retries from clients (idempotency concerns).
    • Scaling to many concurrent jobs.

What to cover in your answer

Describe a high-level design that includes:

  1. API surface
    • Endpoints for:
      • Submitting a processing job for a list of files.
      • Checking job status.
      • Retrieving results (and whether status and results are combined or separate).
    • Request and response shapes at a high level.
  2. Architecture and components
    • How you will store jobs and their state (e.g., database schema at a conceptual level).
    • How you will perform the actual file processing (e.g., background workers, queues).
    • How work is distributed and scaled across multiple machines.
  3. Long-running job handling
    • How you avoid tying up HTTP connections for the duration of processing.
    • How the client can safely retry requests without creating duplicate jobs.
    • How to handle partial failures (some files succeed, others fail).
  4. Reliability and scalability considerations
    • Handling failures and restarts: ensuring jobs are not lost and are eventually completed or marked failed.
    • Idempotency and deduplication strategies.
    • Basic performance considerations and bottlenecks.

You do not need to write actual code, but explain your design clearly enough that an experienced engineer could implement it from your description.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Dropbox•More Software Engineer•Dropbox Software Engineer•Dropbox System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.