PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Software Engineering Fundamentals/Mithril

Debug GPU Resource Allocation

Last updated: Jun 5, 2026

Quick Overview

This question evaluates debugging and root-cause analysis skills focused on GPU resource allocation, lease management, scoring and preemption policies, as well as test-driven troubleshooting and log interpretation.

  • medium
  • Mithril
  • Software Engineering Fundamentals
  • Software Engineer

Debug GPU Resource Allocation

Company: Mithril

Role: Software Engineer

Category: Software Engineering Fundamentals

Difficulty: medium

Interview Round: Onsite

You are given a small codebase for a GPU resource manager. The repository includes a README, logs, and unit tests. Your task is to use the failing tests and logs to identify root causes, make minimal fixes, and explain your changes. The system assigns jobs to GPUs. Each job has fields such as required GPU type, memory requirement, priority, and optional user preferences for specific GPU IDs or GPU types. Each GPU has fields such as ID, type, available memory, health status, current lease owner, and lease expiration time. The failing tests cover three areas: 1. GPU scoring and preference handling: the selected GPU does not respect the user's preference when multiple GPUs are otherwise valid. 2. Resource allocation and preemption: a GPU can be assigned without creating or updating a lease, which can lead to double allocation. 3. Smarter preemption: when preemption is necessary, candidate GPUs should be ranked by score before choosing which running job to preempt. Explain how you would debug the issue, what code areas you would inspect, what minimal fixes you would make, and how you would validate the result with tests and logs.

Quick Answer: This question evaluates debugging and root-cause analysis skills focused on GPU resource allocation, lease management, scoring and preemption policies, as well as test-driven troubleshooting and log interpretation.

Mithril logo
Mithril
May 18, 2026, 12:00 AM
Software Engineer
Onsite
Software Engineering Fundamentals
0
0

You are given a small codebase for a GPU resource manager. The repository includes a README, logs, and unit tests. Your task is to use the failing tests and logs to identify root causes, make minimal fixes, and explain your changes.

The system assigns jobs to GPUs. Each job has fields such as required GPU type, memory requirement, priority, and optional user preferences for specific GPU IDs or GPU types. Each GPU has fields such as ID, type, available memory, health status, current lease owner, and lease expiration time.

The failing tests cover three areas:

  1. GPU scoring and preference handling: the selected GPU does not respect the user's preference when multiple GPUs are otherwise valid.
  2. Resource allocation and preemption: a GPU can be assigned without creating or updating a lease, which can lead to double allocation.
  3. Smarter preemption: when preemption is necessary, candidate GPUs should be ranked by score before choosing which running job to preempt.

Explain how you would debug the issue, what code areas you would inspect, what minimal fixes you would make, and how you would validate the result with tests and logs.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Software Engineering Fundamentals•More Mithril•More Software Engineer•Mithril Software Engineer•Mithril Software Engineering Fundamentals•Software Engineer Software Engineering Fundamentals
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.