PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Google

Design multi-GPU matrix multiplication

Last updated: Apr 29, 2026

Quick Overview

This question evaluates proficiency in multi-GPU parallelism and system-level ML engineering, covering data partitioning, inter-GPU communication primitives, compute scheduling and overlap, memory layout and buffer reuse, numerical precision trade-offs, synchronization, scalability, and failure handling.

  • hard
  • Google
  • ML System Design
  • Machine Learning Engineer

Design multi-GPU matrix multiplication

Company: Google

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

Design and implement computing C = A × B across two GPUs when A and B must reside on both devices. Specify data partitioning (row/column/block tiling), communication primitives (e.g., all-reduce, all-gather, point-to-point), compute scheduling (tiled GEMM with overlap of compute and communication), memory layout and buffer reuse, numerical precision, synchronization, how you aggregate and return C, and discuss scalability and failure handling.

Quick Answer: This question evaluates proficiency in multi-GPU parallelism and system-level ML engineering, covering data partitioning, inter-GPU communication primitives, compute scheduling and overlap, memory layout and buffer reuse, numerical precision trade-offs, synchronization, scalability, and failure handling.

Related Interview Questions

  • Design an app-store app recommendation system - Google (medium)
  • Design a chatbot over structured and unstructured data - Google (medium)
  • Design a fraud detection system - Google (medium)
  • Choose Fast or Cheap Models - Google
  • Design ML system for self-driving perception - Google (medium)
Google logo
Google
Sep 6, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
2
0

Multi-GPU MatMul (2 GPUs): Design and Implementation

You are given two GPUs connected via NVLink or PCIe. You must compute C = A × B where:

  • A is shape m × k and B is shape k × n.
  • Constraint: A and B must be resident on both devices (i.e., replicated on GPU0 and GPU1).

Design a solution that includes:

  1. Data partitioning
  • How you partition the output C across the two GPUs (row/column/block tiling).
  1. Communication primitives
  • Which collectives or point-to-point operations you will use (e.g., all-reduce, all-gather, send/recv), and when.
  1. Compute scheduling
  • The GEMM tiling strategy on each GPU.
  • How you overlap compute with any required communication.
  1. Memory layout and buffer reuse
  • Leading dimensions, alignment, submatrix addressing, scratch/temporary buffers, and reuse.
  1. Numerical precision
  • Dtypes, tensor-core utilization, accumulation precision, and determinism trade-offs.
  1. Synchronization
  • Streams, events/barriers, and how you ensure correctness.
  1. Aggregation and return of C
  • How you assemble and return C (to one GPU, to both GPUs, or to host) under the replication constraint for A and B.
  1. Scalability and failure handling
  • How the approach scales beyond two GPUs and what changes you would make.
  • Failure detection, retries, and graceful degradation.

State any minimal assumptions you need (e.g., matrices fit in GPU memory, NCCL/CUDA available) and provide enough detail that an engineer could implement the system.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Google•More Machine Learning Engineer•Google Machine Learning Engineer•Google ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.