This question evaluates skills in parallel algorithm design, multithreading and synchronization, CPU microarchitecture awareness (SIMD and cache-line behavior), memory locality and false‑sharing avoidance, and numerical performance engineering for implementing valid 1‑D convolution on multicore CPUs.
Assume valid 1‑D convolution produces output y of length M = N − K + 1 from input x (length N) and kernel h (length K):
Design and implement a multithreaded CPU version. For each case, describe work partitioning, scheduling, synchronization, cache locality, false‑sharing avoidance, vectorization (SIMD), and how to combine partial results. Provide pseudocode or an API‑level design.
Cases:
(a) input length N = 1,000,000; kernel length K = 3.
(b) input length N = 1,000,000; kernel length K = 1,000,000.
(c) maximum worker threads = 100.
Login required