Design multithreaded CPU convolution

Q: Design multithreaded CPU convolution

This is a System Design interview question from IBM for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Multithreaded CPU Design for Valid 1‑D Convolution

Assume valid 1‑D convolution produces output y of length M = N − K + 1 from input x (length N) and kernel h (length K):

y[i] = Σ_{j=0}^{K−1} x[i + j] · h[j], for i = 0 … M−1.
Arrays are contiguous in memory; dtype is float32 or float64.
Target: general-purpose multicore CPU with SIMD (e.g., AVX2/AVX‑512), typical 64‑B cache lines.

Design and implement a multithreaded CPU version. For each case, describe work partitioning, scheduling, synchronization, cache locality, false‑sharing avoidance, vectorization (SIMD), and how to combine partial results. Provide pseudocode or an API‑level design.

Cases:

(a) input length N = 1,000,000; kernel length K = 3.

(b) input length N = 1,000,000; kernel length K = 1,000,000.

(c) maximum worker threads = 100.

Design multithreaded CPU convolution

Multithreaded CPU Design for Valid 1‑D Convolution

Solution (Locked)

Comments (0)