You are given two black-box ML services:
-
Classification Service
-
Input: One or more text documents.
-
Output: A label for each document (e.g., topic or category).
-
Embedding Service
-
Input: One or more text documents.
-
Output: A vector embedding (e.g., 768-dim float vector) for each document.
You need to design a system that:
-
Accepts file uploads from users (each file contains one or more text documents).
-
Supports both
single-file
and
bulk
upload (up to
1,000 files
in one request).
-
For each document:
-
Computes a classification label using the classification service.
-
Computes an embedding using the embedding service.
-
Stores results so they can be queried later (e.g., by user, file, or semantic search).
-
Satisfies both:
-
Low latency
for small/single uploads.
-
High throughput
for large/bulk uploads.
Task
Design the end-to-end pipeline and APIs. Specifically address:
-
API Design
-
How clients upload files (single and bulk up to 1,000 files).
-
What responses they receive (synchronous vs asynchronous).
-
Architecture
-
How you orchestrate calls to the classification and embedding services.
-
How you store raw files, parsed text, labels, and embeddings.
-
How you achieve both low latency and high throughput.
-
Scalability & Performance
-
How to handle 1,000-file uploads without running out of memory or violating latency goals.
-
Batching, queuing, and concurrency strategies when talking to the ML services.
-
Reliability & Observability
-
Error handling for partial failures (e.g., some files fail to process).
-
Monitoring, logging, and metrics.
Assume you cannot change the internals of the classification and embedding services; you may only call their APIs.