This question evaluates object-oriented system design, resource-aware scheduling algorithms, data structures and indexes for efficient lookups, concurrency control and failure handling, plus time/space complexity reasoning for placing GPU-requesting pods on nodes.
You are designing a simplified, object-oriented cluster manager with a GPU-aware pod scheduler. Nodes provide a fixed number of GPUs. Pods request a fixed number of GPUs and must be placed on a single node with enough free GPUs.
Each Node has the shape: { name: string, total_gpu: int, running_pods: Pod[] } Each Pod has the shape: { name: string, gpu_required: int }
Assume pods cannot be split across nodes and GPUs are fungible (no topology/NUMA awareness).
Implement APIs:
Also provide:
Login required