How do I approach Software Engineering Fundamentals interview questions?

Software Engineering Fundamentals questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master software engineering fundamentals interviews.

What difficulty level is this interview question?

This is a medium difficulty Software Engineering Fundamentals question, commonly asked during Technical Screen rounds at Heygen.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Heygen during technical interviews.

Build a GPU VM Fleet CLI | Heygen Interview Question

Q: Build a GPU VM Fleet CLI

This question evaluates competency in designing provider-agnostic tooling for managing GPU virtual machines and fleets, covering API integration, lifecycle and reservation semantics, state reconciliation, and fault handling.

You are given a repository containing three mock cloud-provider servers that simulate GPU VM providers — Crusoe Cloud, Lambda Cloud, and Nebius AI Cloud. This is a live AI-assisted coding exercise: build a provider-agnostic CLI tool for requesting and managing GPU virtual machines and fleets of machines across all three providers.

Background

Your company rents GPU VMs from three providers, each with a different API shape and lifecycle model:

Provider	Protocol	Resource model	Operations	Reservations
Crusoe Cloud	REST	Project-scoped	Asynchronous (operation IDs)	Auto-placed into cheapest matching reservation; explicit `reservation_id` optional; stop releases capacity, start reclaims it. Lifecycle includes reboot/reset/restart semantics.
Lambda Cloud	REST	Flat / simple	Mostly synchronous	Instances flagged `is_reserved: true`; reserved instances cannot be terminated via the API; launching with `reservation_id` uses reserved capacity.
Nebius AI Cloud	gRPC	Parent-scoped	Asynchronous (operation IDs)	A `ReservationPolicy` in the instance spec: `AUTO` (try reservation first), `FORBID` (always on-demand), `STRICT` (must use a specific reservation, else fail).

The deliverable is split into two layers (Part 1 and Part 2 below) plus a follow-up discussion. The interviewer cares far more about your design — how you isolate provider differences, model state, and handle partial failure — than about exhaustively wiring every endpoint.

Constraints & Assumptions

Mock servers are provided. You do not have to handle real auth/billing, but you must call the three mock APIs (two REST, one gRPC) through their real interfaces.
Scale of the exercise: a fleet is on the order of single digits to a few dozen VMs; the CLI runs as a short-lived process invoked repeatedly from a shell.
Time budget (guidance): Layer 1 ~30–40 min, Layer 2 ~40–50 min. Favor a clean, extensible design over feature completeness.
Durability: fleet membership must survive process exit (the CLI is invoked once per command), so state lives outside the process.
Mixed sync/async: Crusoe and Nebius return operation IDs that must be polled; Lambda is mostly synchronous. Higher layers should not care which is which.
GPU types / regions are passed as opaque strings (e.g. h100 , us-east ); a provider may not support a given type or region.

Clarifying Questions to Ask

What is the source of truth for "all instances" — do we list only VMs this CLI created, or every VM in each provider account?
For vm fleet create , what allocation policy is expected (cheapest-first, spread evenly, reservation-first), and is the count a hard requirement or best-effort?
If a fleet can only partially fill (e.g. 6 of 10), should it roll back to zero, or keep the partial fleet and report a shortfall?
Where should fleet state live — local file/SQLite for the exercise, or are we expected to design for a shared server-side store?
What output format(s) must commands support (human table, JSON, both)?
How are credentials / provider endpoints supplied (env vars, config file)?

Part 1 — Unified CLI (Layer 1)

Build a CLI that presents one consistent interface for managing individual VMs across all three providers. Each command targets a single provider (except list, which may span all). Required commands:

# List all instances across all providers, or filter by provider
vm list [--provider <name>]

# Create new instance(s)
vm create --provider <name> --gpu <type> --count <n> [--name <name>] [--region <region>]

# Get instance details
vm get <instance_id> --provider <name>

# Stop an instance
vm stop <instance_id> --provider <name>

# Start an instance
vm start <instance_id> --provider <name>

# Destroy / terminate an instance
vm destroy <instance_id> --provider <name>

The hard part is not argument parsing — it is designing a single abstraction (a ProviderClient-style interface plus a normalized Instance model) so that the CLI layer never branches on provider, and so a fourth provider could be added by writing one new adapter.

What This Part Should Cover

A clean provider abstraction (common interface + normalized model + per-provider adapters) with no provider branching above the adapter layer.
Correct mapping of each command to each provider's real API, including project-/parent-scoping for Crusoe/Nebius and the gRPC vs REST split.
Normalization of heterogeneous provider states (e.g. pending / provisioning / running ) into a single canonical state enum.
Consistent, ideally machine-readable output (a table plus --output json ), and sensible argument validation.

Part 2 — Fleet Manager (Layer 2)

Build on top of Part 1 to manage a fleet — a logical group of VMs of one GPU type that may be spread across multiple providers and must be tracked as a unit. Required commands:

# Request N machines of a given GPU type, spread across providers
vm fleet create --gpu <type> --count <n> [--name <fleet_name>]

# List all fleets
vm fleet list

# Show fleet status (which VMs, which providers, which states)
vm fleet status <fleet_name>

# Destroy an entire fleet
vm fleet destroy <fleet_name>

fleet create is the centerpiece: it must allocate N machines across providers, persist membership durably as it goes, and behave sanely when it can only partially fill the request or fails midway.

Clarifying Questions for this Part

Is --name optional, and if so how are unnamed fleets identified (generated name, sequence)?
Should fleet create be idempotent on retry (re-running with the same name resumes vs. creates a second fleet)?
Does fleet destroy need to handle a fleet that's already partially destroyed or has cleanup-failed members?

What This Part Should Cover

A durable fleet store (membership survives process exit) with the right records: fleet metadata + per-VM (provider, provider_instance_id, state, reservation info) .
An explicit, encapsulated allocation strategy and a clear definition of success vs. partial fill.
Correct partial-failure handling: incremental persistence, a stated rollback-vs-keep policy, best-effort retryable cleanup, and not destroying out-of-fleet VMs.
A coherent fleet state machine ( CREATING → ACTIVE / PARTIAL / FAILED → DESTROYED ) reflected consistently in fleet status .

What a Strong Answer Covers

Across both parts, the interviewer is watching for the design instincts that separate a thin wrapper from a maintainable tool:

Separation of concerns: provider differences (protocol, scoping, sync/async, reservation semantics) are quarantined inside adapters; the CLI and fleet layers speak only the normalized model.
Idempotency & duplicate-creation safety: retries of expensive GPU create calls must not silently double-allocate — via request IDs, idempotency keys, or tagging/naming instances with fleet metadata so an interrupted create can be reconciled.
Durable, recoverable state: state is written before/after each meaningful transition so a crash mid-create leaves a recoverable record, not orphaned VMs.
Honest failure reporting: structured error types (e.g. InsufficientCapacity , ReservedInstanceCannotBeTerminated , OperationTimeout ) surfaced as clear human-readable messages, never silent success.
Testability: the adapter seam makes state-mapping, reservation-policy translation, and partial-failure rollback unit-testable against the mock servers.

Follow-up Questions

What are the major API differences between the three providers (protocol, scoping, operation style, reservation semantics), and how does your code keep them from leaking past the adapter layer?
How do you store the final set of machines that belong to a fleet, and what schema makes status , cleanup, and idempotent retry possible?
How do you clean up partially created machines when fleet create fails midway — including the cases where destroy is async, a Lambda member is reserved/non-terminable, or the CLI crashes during cleanup?
What problems could occur in implementation or production (duplicate allocation, rate limits, lost responses, stale local state), and how would you mitigate each?

Background

Your company rents GPU VMs from three providers, each with a different API shape and lifecycle model:

Provider	Protocol	Resource model	Operations	Reservations
Crusoe Cloud	REST	Project-scoped	Asynchronous (operation IDs)	Auto-placed into cheapest matching reservation; explicit `reservation_id` optional; stop releases capacity, start reclaims it. Lifecycle includes reboot/reset/restart semantics.
Lambda Cloud	REST	Flat / simple	Mostly synchronous	Instances flagged `is_reserved: true`; reserved instances cannot be terminated via the API; launching with `reservation_id` uses reserved capacity.
Nebius AI Cloud	gRPC	Parent-scoped	Asynchronous (operation IDs)	A `ReservationPolicy` in the instance spec: `AUTO` (try reservation first), `FORBID` (always on-demand), `STRICT` (must use a specific reservation, else fail).

Constraints & Assumptions

Mock servers are provided. You do not have to handle real auth/billing, but you must call the three mock APIs (two REST, one gRPC) through their real interfaces.
Scale of the exercise: a fleet is on the order of single digits to a few dozen VMs; the CLI runs as a short-lived process invoked repeatedly from a shell.
Time budget (guidance): Layer 1 ~30–40 min, Layer 2 ~40–50 min. Favor a clean, extensible design over feature completeness.
Durability: fleet membership must survive process exit (the CLI is invoked once per command), so state lives outside the process.
Mixed sync/async: Crusoe and Nebius return operation IDs that must be polled; Lambda is mostly synchronous. Higher layers should not care which is which.
GPU types / regions are passed as opaque strings (e.g. h100 , us-east ); a provider may not support a given type or region.

Clarifying Questions to Ask

What is the source of truth for "all instances" — do we list only VMs this CLI created, or every VM in each provider account?
For vm fleet create , what allocation policy is expected (cheapest-first, spread evenly, reservation-first), and is the count a hard requirement or best-effort?
If a fleet can only partially fill (e.g. 6 of 10), should it roll back to zero, or keep the partial fleet and report a shortfall?
Where should fleet state live — local file/SQLite for the exercise, or are we expected to design for a shared server-side store?
What output format(s) must commands support (human table, JSON, both)?
How are credentials / provider endpoints supplied (env vars, config file)?

Part 1 — Unified CLI (Layer 1)

# List all instances across all providers, or filter by provider
vm list [--provider <name>]

# Create new instance(s)
vm create --provider <name> --gpu <type> --count <n> [--name <name>] [--region <region>]

# Get instance details
vm get <instance_id> --provider <name>

# Stop an instance
vm stop <instance_id> --provider <name>

# Start an instance
vm start <instance_id> --provider <name>

# Destroy / terminate an instance
vm destroy <instance_id> --provider <name>

What This Part Should Cover

A clean provider abstraction (common interface + normalized model + per-provider adapters) with no provider branching above the adapter layer.
Correct mapping of each command to each provider's real API, including project-/parent-scoping for Crusoe/Nebius and the gRPC vs REST split.
Normalization of heterogeneous provider states (e.g. pending / provisioning / running ) into a single canonical state enum.
Consistent, ideally machine-readable output (a table plus --output json ), and sensible argument validation.

Part 2 — Fleet Manager (Layer 2)

Build on top of Part 1 to manage a fleet — a logical group of VMs of one GPU type that may be spread across multiple providers and must be tracked as a unit. Required commands:

# Request N machines of a given GPU type, spread across providers
vm fleet create --gpu <type> --count <n> [--name <fleet_name>]

# List all fleets
vm fleet list

# Show fleet status (which VMs, which providers, which states)
vm fleet status <fleet_name>

# Destroy an entire fleet
vm fleet destroy <fleet_name>

Clarifying Questions for this Part

Is --name optional, and if so how are unnamed fleets identified (generated name, sequence)?
Should fleet create be idempotent on retry (re-running with the same name resumes vs. creates a second fleet)?
Does fleet destroy need to handle a fleet that's already partially destroyed or has cleanup-failed members?

What This Part Should Cover

A durable fleet store (membership survives process exit) with the right records: fleet metadata + per-VM (provider, provider_instance_id, state, reservation info) .
An explicit, encapsulated allocation strategy and a clear definition of success vs. partial fill.
Correct partial-failure handling: incremental persistence, a stated rollback-vs-keep policy, best-effort retryable cleanup, and not destroying out-of-fleet VMs.
A coherent fleet state machine ( CREATING → ACTIVE / PARTIAL / FAILED → DESTROYED ) reflected consistently in fleet status .

What a Strong Answer Covers

Across both parts, the interviewer is watching for the design instincts that separate a thin wrapper from a maintainable tool:

Separation of concerns: provider differences (protocol, scoping, sync/async, reservation semantics) are quarantined inside adapters; the CLI and fleet layers speak only the normalized model.
Idempotency & duplicate-creation safety: retries of expensive GPU create calls must not silently double-allocate — via request IDs, idempotency keys, or tagging/naming instances with fleet metadata so an interrupted create can be reconciled.
Durable, recoverable state: state is written before/after each meaningful transition so a crash mid-create leaves a recoverable record, not orphaned VMs.
Honest failure reporting: structured error types (e.g. InsufficientCapacity , ReservedInstanceCannotBeTerminated , OperationTimeout ) surfaced as clear human-readable messages, never silent success.
Testability: the adapter seam makes state-mapping, reservation-policy translation, and partial-failure rollback unit-testable against the mock servers.

Follow-up Questions

What are the major API differences between the three providers (protocol, scoping, operation style, reservation semantics), and how does your code keep them from leaking past the adapter layer?
How do you store the final set of machines that belong to a fleet, and what schema makes status , cleanup, and idempotent retry possible?
How do you clean up partially created machines when fleet create fails midway — including the cases where destroy is async, a Lambda member is reserved/non-terminable, or the CLI crashes during cleanup?
What problems could occur in implementation or production (duplicate allocation, rate limits, lost responses, stale local state), and how would you mitigate each?

Build a GPU VM Fleet CLI

Quick Overview

Background

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Unified CLI (Layer 1)

What This Part Should Cover

Part 2 — Fleet Manager (Layer 2)

Clarifying Questions for this Part

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Build a GPU VM Fleet CLI

Quick Overview

Background

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Unified CLI (Layer 1)

What This Part Should Cover

Part 2 — Fleet Manager (Layer 2)

Clarifying Questions for this Part

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP