Design an image-to-multilanguage translator
Company: Axon
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
## Scenario
Design a service that lets users upload an image containing text (e.g., a menu, screenshot, street sign), and returns the same image with the text translated into a target language (or multiple target languages).
## Requirements
### Functional
- Upload an image and select **target language(s)**.
- System performs **OCR** to detect text and bounding boxes.
- Translate detected text into target language(s).
- Return:
- A **translated image** (text rendered back onto the image), and/or
- A structured result: detected text blocks, bounding boxes, original text, translated text.
- Support common image formats (JPG/PNG). Handle large images.
### Non-functional
- Low latency for interactive usage (single image).
- Scalable for bursts (batch-like spikes).
- Reliability: retries, idempotency, job tracking.
- Security & privacy: images may contain sensitive content.
- Observability: metrics, logs, tracing.
## Deep-dive areas to be ready for
- Frontend ↔ backend communication and security (authn/authz, request integrity).
- Data model: what tables/documents exist, what each is used for.
- How you store images and derived artifacts (OCR outputs, translations, rendered images).
- Handling multiple target languages per image and caching.
Quick Answer: This question evaluates a candidate's ability to design a scalable, low-latency system for processing images with OCR and multilingual translation, exercising competencies in distributed systems, storage and data modeling, frontend-backend communication, security/privacy, and observability.