Performance Optimization Plan for a Compute Kernel
Context
You are given:
-
A compute kernel (single critical function or set of loops) to optimize.
-
A cycle-accurate simulator that both verifies functional correctness and reports runtime/cycle counts.
Goal: Within two weeks, achieve the largest possible speedup without changing the kernel's outputs.
Task
Describe your end-to-end plan to:
-
Establish a reproducible baseline.
-
Profile to find bottlenecks and form hypotheses.
-
Select and apply optimizations, including:
-
Data-layout transformations.
-
Strength reductions via bitwise operations.
-
Hashing where helpful.
-
VLIW-style instruction scheduling (manual ILP and software pipelining).
-
Validate correctness after each change.
-
Avoid overfitting to the simulator.
-
Quantify speedup improvements.
Requirements
-
Outputs must be identical to baseline.
-
Prioritize the first three optimizations you would try and justify them.
-
Explain how you will measure and report gains after each change.