PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/NVIDIA

Describe model-to-GPU execution pipeline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates knowledge of the end-to-end model-to-GPU execution pipeline, including frontend model representations, intermediate representations, lowering/compilation to device code, runtime memory and scheduling, and common compiler/runtime optimizations with their trade-offs.

  • medium
  • NVIDIA
  • ML System Design
  • Software Engineer

Describe model-to-GPU execution pipeline

Company: NVIDIA

Role: Software Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

Walk through the stages from defining a model to executing it on a GPU: the frontend representation, exporting to an intermediate graph (such as ONNX or a similar IR), compiling to device code, and runtime execution. Describe common compiler optimization techniques (for example, kernel fusion, quantization, and operator specialization), discuss their trade-offs, and contrast considerations for data-center versus edge hardware targets.

Quick Answer: This question evaluates knowledge of the end-to-end model-to-GPU execution pipeline, including frontend model representations, intermediate representations, lowering/compilation to device code, runtime memory and scheduling, and common compiler/runtime optimizations with their trade-offs.

Related Interview Questions

  • Design real-time fraud detection under 50ms - NVIDIA (easy)
  • How would you optimize large-scale training/inference? - NVIDIA (medium)
  • Explain ML compilation optimizations and hardware fit - NVIDIA (medium)
  • Explain ML framework trends - NVIDIA (hard)
  • Discuss Transformer LLM Design - NVIDIA (hard)
NVIDIA logo
NVIDIA
Jul 31, 2025, 12:00 AM
Software Engineer
Technical Screen
ML System Design
4
0

From Model Definition to GPU Execution: Pipeline and Optimizations

You are asked to explain the end-to-end path a machine learning model takes from authoring to high-performance inference on a GPU.

Task

Walk through the stages below and describe what happens at each step:

  1. Frontend representation
    • How a model is defined in a high-level framework (e.g., dynamic vs. static graphs, tracing vs. scripting).
  2. Export to an intermediate representation (IR)
    • Exporting to ONNX or a similar IR; making shapes/layouts explicit; simplifying the graph.
  3. Compilation to device code
    • Lowering from IR to kernel calls or device code; scheduling; autotuning; static vs. dynamic shapes.
  4. Runtime execution
    • Memory management, kernel launch, streams, batching, and handling dynamic inputs.

Then, discuss common compiler/runtime optimization techniques and their trade-offs, including:

  • Kernel/operator fusion
  • Quantization (e.g., INT8, FP16/FP8)
  • Operator specialization and autotuning
  • Layout and precision selection
  • Memory planning and graph partitioning

Finally, contrast considerations for data-center versus edge hardware targets:

  • Throughput vs. latency priorities
  • Batch size, power/thermal limits, memory budgets
  • JIT vs. AOT, startup time, binary size, determinism

Assume a modern GPU software stack with an IR (e.g., ONNX/MLIR/Relay/XLA), a compiler/runtime (e.g., ONNX Runtime, vendor-specific runtimes), and access to common math libraries (e.g., BLAS/DNN). Keep your explanation structured and concise.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More NVIDIA•More Software Engineer•NVIDIA Software Engineer•NVIDIA ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.