Optimize a core kernel for throughput

Q: Optimize a core kernel for throughput

This question evaluates low-level performance optimization competencies, including loop transformations, memory-access patterns, vectorization, parallelism, operator fusion, and minimizing allocations while preserving identical output semantics.

Q: How do I approach Software Engineering Fundamentals interview questions?

Software Engineering Fundamentals questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master software engineering fundamentals interviews.

Question

Loading...

You are given a mocked “core kernel” function (similar in spirit to a GPU kernel / tight compute loop) that is functionally correct but slow.

Task

Optimize the kernel to improve performance as much as possible within a fixed timebox (e.g., ~2 hours).
You may use typical low-level optimization techniques such as:
- loop unrolling
- memory access optimization (e.g., coalescing / cache-friendly access)
- reducing allocations and copies
- operator fusion / reducing intermediate buffers
- vectorization (SIMD) and/or parallelism where applicable
Provide:
1. Your optimized implementation
2. A short write-up explaining what you changed and why
3. Benchmarks showing speedup vs baseline
4. Evidence you preserved correctness (tests or checks)

Constraints / expectations

Maintain identical output semantics.
Optimize for end-to-end runtime (not just micro-benchmarks of one line).
Explain tradeoffs (readability vs performance, portability, precision, etc.).

Optimize a core kernel for throughput

Overview

Comments (0)