Rigorous Profiling and Experimentation Plan for a Kernel Simulator
You are given only a kernel simulator that reports cycle counts and microarchitectural counters such as IPC, stall reasons, occupancy, and memory bandwidth. Design a rigorous plan to profile and optimize a compute kernel using this simulator.
Provide:
-
Baseline definition and environment control.
-
Experiment design with controlled variables (including screening vs. deep dives).
-
Data collection schema and derived metrics.
-
Variance reduction and statistical methodology.
-
Stop criteria for iterations.
-
Methods to attribute speedup to specific changes (including decomposition and ablation).
-
Functional correctness checks after each iteration.
Make minimal, explicit assumptions if necessary to ensure the plan is self-contained.