Design file-processing API with long-running jobs

Q: Design file-processing API with long-running jobs

This is a System Design interview question from Dropbox for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

You are asked to design an HTTP-based API and high-level backend architecture for processing files in a cloud storage service (similar to a simplified Dropbox).

The system needs to perform a CPU-intensive operation on files (for example, virus scanning, OCR, or thumbnail generation).

Part 1: Simple synchronous processing API

Start with a basic requirement:

The client provides a list of file identifiers (e.g., file IDs or paths owned by the user).
The backend must process each of these files and return the result for all files in a single response.
Initially, you may assume that processing each file is relatively fast (e.g., under a second each) and the total request can reasonably complete within typical HTTP timeouts.

Design:

A clear REST-style API: request and response formats for a synchronous endpoint like POST /processFiles .
High-level backend components and data flow for handling this request (no need for very low-level implementation details).

Address in your design:

How the client specifies which files to process.
What information the response returns (per-file success/failure, output data, errors).
Basic error handling (e.g., if some files fail).

Part 2: Handling long-running requests (follow-up)

Now assume that processing each file can take a long time, ranging from several seconds to several minutes, and the list may contain many files. A fully synchronous HTTP request will often:

Exceed frontend or load balancer timeouts.
Provide poor user experience if the client must wait for a long open connection.

Extend your design to handle long-running processing robustly.

New requirements:

The client can still submit a list of files to be processed.
The request should return quickly (within a few seconds) even if total processing will take minutes.
The client must be able to:
- Track the status of the processing job (e.g., pending, in-progress, completed, failed).
- Obtain per-file results once processing is complete.
The system should handle:
- Server restarts and crashes.
- Retries from clients (idempotency concerns).
- Scaling to many concurrent jobs.

What to cover in your answer

Describe a high-level design that includes:

API surface
- Endpoints for:
  - Submitting a processing job for a list of files.
  - Checking job status.
  - Retrieving results (and whether status and results are combined or separate).
- Request and response shapes at a high level.
Architecture and components
- How you will store jobs and their state (e.g., database schema at a conceptual level).
- How you will perform the actual file processing (e.g., background workers, queues).
- How work is distributed and scaled across multiple machines.
Long-running job handling
- How you avoid tying up HTTP connections for the duration of processing.
- How the client can safely retry requests without creating duplicate jobs.
- How to handle partial failures (some files succeed, others fail).
Reliability and scalability considerations
- Handling failures and restarts: ensuring jobs are not lost and are eventually completed or marked failed.
- Idempotency and deduplication strategies.
- Basic performance considerations and bottlenecks.

You do not need to write actual code, but explain your design clearly enough that an experienced engineer could implement it from your description.

Design file-processing API with long-running jobs

Part 1: Simple synchronous processing API

Part 2: Handling long-running requests (follow-up)

What to cover in your answer

Solution

Comments (0)