PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/IBM

Design multithreaded CPU convolution

Last updated: Mar 29, 2026

Quick Overview

This question evaluates skills in parallel algorithm design, multithreading and synchronization, CPU microarchitecture awareness (SIMD and cache-line behavior), memory locality and false‑sharing avoidance, and numerical performance engineering for implementing valid 1‑D convolution on multicore CPUs.

  • hard
  • IBM
  • System Design
  • Software Engineer

Design multithreaded CPU convolution

Company: IBM

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Take-home Project

Design and implement a multithreaded CPU version of the valid 1‑D convolution from above. For each case, describe work partitioning, scheduling, synchronization, cache locality, false‑sharing avoidance, vectorization (SIMD), and how to combine partial results: (a) input length = 1,000,000; kernel length = 3. (b) input length = 1,000,000; kernel length = 1,000,000. (c) maximum worker threads = 100. Provide pseudocode or an API-level design.

Quick Answer: This question evaluates skills in parallel algorithm design, multithreading and synchronization, CPU microarchitecture awareness (SIMD and cache-line behavior), memory locality and false‑sharing avoidance, and numerical performance engineering for implementing valid 1‑D convolution on multicore CPUs.

Related Interview Questions

  • Design multithreaded 1D convolution strategies - IBM (hard)
IBM logo
IBM
Sep 6, 2025, 12:00 AM
Software Engineer
Take-home Project
System Design
2
0

Multithreaded CPU Design for Valid 1‑D Convolution

Assume valid 1‑D convolution produces output y of length M = N − K + 1 from input x (length N) and kernel h (length K):

  • y[i] = Σ_{j=0}^{K−1} x[i + j] · h[j], for i = 0 … M−1.
  • Arrays are contiguous in memory; dtype is float32 or float64.
  • Target: general-purpose multicore CPU with SIMD (e.g., AVX2/AVX‑512), typical 64‑B cache lines.

Design and implement a multithreaded CPU version. For each case, describe work partitioning, scheduling, synchronization, cache locality, false‑sharing avoidance, vectorization (SIMD), and how to combine partial results. Provide pseudocode or an API‑level design.

Cases:

(a) input length N = 1,000,000; kernel length K = 3.

(b) input length N = 1,000,000; kernel length K = 1,000,000.

(c) maximum worker threads = 100.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More IBM•More Software Engineer•IBM Software Engineer•IBM System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.