Design multithreaded 1D convolution strategies

Q: Design multithreaded 1D convolution strategies

This is a System Design interview question from IBM for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Optimize CPU Multithreaded 1D "Valid" Convolution

Context

You are given a 1D input array x of length N and a kernel h of length K. The "valid" convolution produces an output array y of length L_out = N − K + 1, where:

y[i] = sum_{j=0}^{K-1} x[i + j] * h[K − 1 − j]
Index i ranges from 0 to L_out − 1.

Assume a typical CPU with a shared last-level cache, private L1/L2 per core, 64-byte cache lines, and support for multiple threads.

Task

Optimize the valid 1D convolution for CPU hardware using multithreading. For each case, describe and implement (code or pseudocode) how to:

Partition work,
Assign threads,
Combine results,
Address load balancing, cache locality, and avoiding false sharing,
Synchronize/merge outputs,
Choose chunk sizes and the number of threads.

Consider these cases:

N = 1,000,000; K = 3,
N = 1,000,000; K = 1,000,000,
Maximum available threads = 100.

Design multithreaded 1D convolution strategies

Optimize CPU Multithreaded 1D "Valid" Convolution

Context

Task

Solution (Locked)

Comments (0)