You are asked to design an HTTP-based API and high-level backend architecture for processing files in a cloud storage service (similar to a simplified Dropbox).
The system needs to perform a CPU-intensive operation on files (for example, virus scanning, OCR, or thumbnail generation).
Part 1: Simple synchronous processing API
Start with a basic requirement:
-
The client provides a
list of file identifiers
(e.g., file IDs or paths owned by the user).
-
The backend must process each of these files and return the
result for all files
in a single response.
-
Initially, you may assume that processing each file is relatively fast (e.g., under a second each) and the total request can reasonably complete within typical HTTP timeouts.
Design:
-
A clear REST-style API: request and response formats for a synchronous endpoint like
POST /processFiles
.
-
High-level backend components and data flow for handling this request (no need for very low-level implementation details).
Address in your design:
-
How the client specifies which files to process.
-
What information the response returns (per-file success/failure, output data, errors).
-
Basic error handling (e.g., if some files fail).
Part 2: Handling long-running requests (follow-up)
Now assume that processing each file can take a long time, ranging from several seconds to several minutes, and the list may contain many files. A fully synchronous HTTP request will often:
-
Exceed frontend or load balancer timeouts.
-
Provide poor user experience if the client must wait for a long open connection.
Extend your design to handle long-running processing robustly.
New requirements:
-
The client can still submit a list of files to be processed.
-
The request should return quickly (within a few seconds) even if total processing will take minutes.
-
The client must be able to:
-
Track the
status
of the processing job (e.g., pending, in-progress, completed, failed).
-
Obtain
per-file results
once processing is complete.
-
The system should handle:
-
Server restarts and crashes.
-
Retries from clients (idempotency concerns).
-
Scaling to many concurrent jobs.
What to cover in your answer
Describe a high-level design that includes:
-
API surface
-
Endpoints for:
-
Submitting a processing job for a list of files.
-
Checking job status.
-
Retrieving results (and whether status and results are combined or separate).
-
Request and response shapes at a high level.
-
Architecture and components
-
How you will store jobs and their state (e.g., database schema at a conceptual level).
-
How you will perform the actual file processing (e.g., background workers, queues).
-
How work is distributed and scaled across multiple machines.
-
Long-running job handling
-
How you avoid tying up HTTP connections for the duration of processing.
-
How the client can safely retry requests without creating duplicate jobs.
-
How to handle partial failures (some files succeed, others fail).
-
Reliability and scalability considerations
-
Handling failures and restarts: ensuring jobs are not lost and are eventually completed or marked failed.
-
Idempotency and deduplication strategies.
-
Basic performance considerations and bottlenecks.
You do not need to write actual code, but explain your design clearly enough that an experienced engineer could implement it from your description.