PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/System Design/Together AI

Design a GPU-aware pod scheduler

Last updated: Apr 24, 2026

Quick Overview

This question evaluates object-oriented system design, resource-aware scheduling algorithms, data structures and indexes for efficient lookups, concurrency control and failure handling, plus time/space complexity reasoning for placing GPU-requesting pods on nodes.

  • hard
  • Together AI
  • System Design
  • Software Engineer

Design a GPU-aware pod scheduler

Company: Together AI

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design an object-oriented, GPU-aware pod scheduler and cluster manager. Each Node has the shape {name: string, total_gpu: int, running_pods: Pod[]}. Each Pod has the shape {name: string, gpu_required: int}. Implement APIs: add_node(name, total_gpu), remove_node(name), add_pod(name, gpu_required), schedule_pod(pod_name) that assigns the pod to a node with enough free GPUs, remove_pod(pod_name), get_node_utilization(name), and list_nodes()/list_pods(). Specify data structures to support efficient lookups of nodes by available GPUs and pods by name. Describe and justify a placement strategy (e.g., best-fit or first-fit) and how you'd update indexes on every add/remove/schedule/evict operation. Discuss concurrency control (simultaneous adds/schedules), idempotency, and failure handling (e.g., removing a node that still has running pods, pod rescheduling on node removal). Provide time and space complexity for each API and write pseudocode for schedule_pod using your chosen strategy. Include edge cases like gpu_required > total_gpu on any node and fragmentation when multiple small pods occupy a large node.

Quick Answer: This question evaluates object-oriented system design, resource-aware scheduling algorithms, data structures and indexes for efficient lookups, concurrency control and failure handling, plus time/space complexity reasoning for placing GPU-requesting pods on nodes.

Together AI logo
Together AI
Aug 7, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
37
0

GPU-Aware Pod Scheduler and Cluster Manager (OO Design)

Context

You are designing a simplified, object-oriented cluster manager with a GPU-aware pod scheduler. Nodes provide a fixed number of GPUs. Pods request a fixed number of GPUs and must be placed on a single node with enough free GPUs.

Each Node has the shape: { name: string, total_gpu: int, running_pods: Pod[] } Each Pod has the shape: { name: string, gpu_required: int }

Assume pods cannot be split across nodes and GPUs are fungible (no topology/NUMA awareness).

Requirements

Implement APIs:

  1. add_node(name, total_gpu)
  2. remove_node(name)
  3. add_pod(name, gpu_required)
  4. schedule_pod(pod_name) — assigns the pod to a node with enough free GPUs
  5. remove_pod(pod_name)
  6. get_node_utilization(name)
  7. list_nodes() / list_pods()

Also provide:

  • Data structures to support efficient lookups of nodes by available GPUs and pods by name.
  • A placement strategy (e.g., best-fit or first-fit) and justification.
  • How to update indexes on every add/remove/schedule/evict operation.
  • Concurrency control for simultaneous adds/schedules, idempotency, and failure handling (e.g., removing a node that still has running pods; pod rescheduling on node removal).
  • Time and space complexity for each API.
  • Pseudocode for schedule_pod using your chosen strategy.
  • Edge cases, including gpu_required > total_gpu on any node and fragmentation when multiple small pods occupy a large node.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Together AI•More Software Engineer•Together AI Software Engineer•Together AI System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.