Design multithreaded CPU convolution

Q: Design multithreaded CPU convolution

This question evaluates skills in parallel algorithm design, multithreading and synchronization, CPU microarchitecture awareness (SIMD and cache-line behavior), memory locality and false‑sharing avoidance, and numerical performance engineering for implementing valid 1‑D convolution on multicore CPUs.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Multithreaded CPU Design for Valid 1‑D Convolution

Assume valid 1‑D convolution produces output y of length M = N − K + 1 from input x (length N) and kernel h (length K):

y[i] = Σ_{j=0}^{K−1} x[i + j] · h[j], for i = 0 … M−1.
Arrays are contiguous in memory; dtype is float32 or float64.
Target: general-purpose multicore CPU with SIMD (e.g., AVX2/AVX‑512), typical 64‑B cache lines.

Design and implement a multithreaded CPU version. For each case, describe work partitioning, scheduling, synchronization, cache locality, false‑sharing avoidance, vectorization (SIMD), and how to combine partial results. Provide pseudocode or an API‑level design.

Cases:

(a) input length N = 1,000,000; kernel length K = 3.

(b) input length N = 1,000,000; kernel length K = 1,000,000.

(c) maximum worker threads = 100.

Design multithreaded CPU convolution

Quick Overview

Multithreaded CPU Design for Valid 1‑D Convolution

Solution

Comments (0)