How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Onsite rounds at Scale AI.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Scale AI during technical interviews.

Design pipeline using classification and embedding services

Quick Overview

This question evaluates a candidate's proficiency in ML system design, covering API design, service orchestration, data storage modeling, scalability strategies, and reliability/observability when integrating black-box classification and embedding services.

You are given two black-box ML services:

Classification Service
- Input: One or more text documents.
- Output: A label for each document (e.g., topic or category).
Embedding Service
- Input: One or more text documents.
- Output: A vector embedding (e.g., 768-dim float vector) for each document.

You need to design a system that:

Accepts file uploads from users (each file contains one or more text documents).
Supports both single-file and bulk upload (up to 1,000 files in one request).
For each document:
- Computes a classification label using the classification service.
- Computes an embedding using the embedding service.
Stores results so they can be queried later (e.g., by user, file, or semantic search).
Satisfies both:
- Low latency for small/single uploads.
- High throughput for large/bulk uploads.

Task

Design the end-to-end pipeline and APIs. Specifically address:

API Design
- How clients upload files (single and bulk up to 1,000 files).
- What responses they receive (synchronous vs asynchronous).
Architecture
- How you orchestrate calls to the classification and embedding services.
- How you store raw files, parsed text, labels, and embeddings.
- How you achieve both low latency and high throughput.
Scalability & Performance
- How to handle 1,000-file uploads without running out of memory or violating latency goals.
- Batching, queuing, and concurrency strategies when talking to the ML services.
Reliability & Observability
- Error handling for partial failures (e.g., some files fail to process).
- Monitoring, logging, and metrics.

Assume you cannot change the internals of the classification and embedding services; you may only call their APIs.

Quick Overview

You are given two black-box ML services:

Classification Service
- Input: One or more text documents.
- Output: A label for each document (e.g., topic or category).
Embedding Service
- Input: One or more text documents.
- Output: A vector embedding (e.g., 768-dim float vector) for each document.

You need to design a system that:

Accepts file uploads from users (each file contains one or more text documents).
Supports both single-file and bulk upload (up to 1,000 files in one request).
For each document:
- Computes a classification label using the classification service.
- Computes an embedding using the embedding service.
Stores results so they can be queried later (e.g., by user, file, or semantic search).
Satisfies both:
- Low latency for small/single uploads.
- High throughput for large/bulk uploads.

Task

Design the end-to-end pipeline and APIs. Specifically address:

API Design
- How clients upload files (single and bulk up to 1,000 files).
- What responses they receive (synchronous vs asynchronous).
Architecture
- How you orchestrate calls to the classification and embedding services.
- How you store raw files, parsed text, labels, and embeddings.
- How you achieve both low latency and high throughput.
Scalability & Performance
- How to handle 1,000-file uploads without running out of memory or violating latency goals.
- Batching, queuing, and concurrency strategies when talking to the ML services.
Reliability & Observability
- Error handling for partial failures (e.g., some files fail to process).
- Monitoring, logging, and metrics.

Assume you cannot change the internals of the classification and embedding services; you may only call their APIs.

Design pipeline using classification and embedding services

Quick Overview

Solution

Comments (0)

Design pipeline using classification and embedding services

Quick Overview

Solution

Comments (0)