Design an image-to-multilanguage translator

Q: Design an image-to-multilanguage translator

This is a System Design interview question from Axon for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Loading...

Scenario

Design a service that lets users upload an image containing text (e.g., a menu, screenshot, street sign), and returns the same image with the text translated into a target language (or multiple target languages).

Requirements

Functional

Upload an image and select target language(s) .
System performs OCR to detect text and bounding boxes.
Translate detected text into target language(s).
Return:
- A translated image (text rendered back onto the image), and/or
- A structured result: detected text blocks, bounding boxes, original text, translated text.
Support common image formats (JPG/PNG). Handle large images.

Non-functional

Low latency for interactive usage (single image).
Scalable for bursts (batch-like spikes).
Reliability: retries, idempotency, job tracking.
Security & privacy: images may contain sensitive content.
Observability: metrics, logs, tracing.

Deep-dive areas to be ready for

Frontend ↔ backend communication and security (authn/authz, request integrity).
Data model: what tables/documents exist, what each is used for.
How you store images and derived artifacts (OCR outputs, translations, rendered images).
Handling multiple target languages per image and caching.