How would you scale batch image pipelines?

Q: How would you scale batch image pipelines?

This question evaluates a candidate's competence in designing scalable, reliable batch image-processing pipelines, testing knowledge of distributed systems concepts such as storage and caching strategies, queuing and worker orchestration, fault tolerance, retries, and observability.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Loading...

Design a system to process m input images with n pipelines, producing m×n outputs.

Pipelines are sequences of image operations (resize/rotate/filter/etc.).
Users can submit jobs (a set of images + one or more pipelines).
The system must run at large scale (many images, many jobs) with reasonable cost and reliability.

Answer the following:

What components would you build (APIs, storage, queues, workers, metadata DB)?
How would you parallelize work and avoid waste (e.g., avoid re-reading the same image repeatedly)?
How do you ensure fault tolerance, retries, idempotency, and observability?
What are key bottlenecks and optimizations (CPU vs I/O, caching, batching, intermediate results)?
How would you justify your scaling approach (threads vs processes vs distributed workers; serverless vs containers)?

How would you scale batch image pipelines?

Overview

Comments (0)