PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers
|Home/ML System Design/Scale AI

Design pipeline using classification and embedding services

Last updated: Apr 20, 2026

Quick Overview

This question evaluates a candidate's proficiency in ML system design, covering API design, service orchestration, data storage modeling, scalability strategies, and reliability/observability when integrating black-box classification and embedding services.

  • medium
  • Scale AI
  • ML System Design
  • Software Engineer

Design pipeline using classification and embedding services

Company: Scale AI

Role: Software Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Onsite

You are given two **black-box ML services**: 1. **Classification Service** - Input: One or more text documents. - Output: A label for each document (e.g., topic or category). 2. **Embedding Service** - Input: One or more text documents. - Output: A vector embedding (e.g., 768-dim float vector) for each document. You need to design a system that: - Accepts file uploads from users (each file contains one or more text documents). - Supports both **single-file** and **bulk** upload (up to **1,000 files** in one request). - For each document: - Computes a classification label using the classification service. - Computes an embedding using the embedding service. - Stores results so they can be queried later (e.g., by user, file, or semantic search). - Satisfies both: - **Low latency** for small/single uploads. - **High throughput** for large/bulk uploads. **Task** Design the end-to-end pipeline and APIs. Specifically address: 1. **API Design** - How clients upload files (single and bulk up to 1,000 files). - What responses they receive (synchronous vs asynchronous). 2. **Architecture** - How you orchestrate calls to the classification and embedding services. - How you store raw files, parsed text, labels, and embeddings. - How you achieve both low latency and high throughput. 3. **Scalability & Performance** - How to handle 1,000-file uploads without running out of memory or violating latency goals. - Batching, queuing, and concurrency strategies when talking to the ML services. 4. **Reliability & Observability** - Error handling for partial failures (e.g., some files fail to process). - Monitoring, logging, and metrics. Assume you cannot change the internals of the classification and embedding services; you may only call their APIs.

Quick Answer: This question evaluates a candidate's proficiency in ML system design, covering API design, service orchestration, data storage modeling, scalability strategies, and reliability/observability when integrating black-box classification and embedding services.

Related Interview Questions

  • Design an LLM API pipeline - Scale AI (easy)
Scale AI logo
Scale AI
Dec 8, 2025, 7:32 PM
Software Engineer
Onsite
ML System Design
67
0

You are given two black-box ML services:

  1. Classification Service
    • Input: One or more text documents.
    • Output: A label for each document (e.g., topic or category).
  2. Embedding Service
    • Input: One or more text documents.
    • Output: A vector embedding (e.g., 768-dim float vector) for each document.

You need to design a system that:

  • Accepts file uploads from users (each file contains one or more text documents).
  • Supports both single-file and bulk upload (up to 1,000 files in one request).
  • For each document:
    • Computes a classification label using the classification service.
    • Computes an embedding using the embedding service.
  • Stores results so they can be queried later (e.g., by user, file, or semantic search).
  • Satisfies both:
    • Low latency for small/single uploads.
    • High throughput for large/bulk uploads.

Task

Design the end-to-end pipeline and APIs. Specifically address:

  1. API Design
    • How clients upload files (single and bulk up to 1,000 files).
    • What responses they receive (synchronous vs asynchronous).
  2. Architecture
    • How you orchestrate calls to the classification and embedding services.
    • How you store raw files, parsed text, labels, and embeddings.
    • How you achieve both low latency and high throughput.
  3. Scalability & Performance
    • How to handle 1,000-file uploads without running out of memory or violating latency goals.
    • Batching, queuing, and concurrency strategies when talking to the ML services.
  4. Reliability & Observability
    • Error handling for partial failures (e.g., some files fail to process).
    • Monitoring, logging, and metrics.

Assume you cannot change the internals of the classification and embedding services; you may only call their APIs.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Scale AI•More Software Engineer•Scale AI Software Engineer•Scale AI ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.