This question evaluates competency in performance profiling, experimental design, microarchitectural analysis, statistical methodology, and kernel-level performance engineering.

You are given only a kernel simulator that reports cycle counts and microarchitectural counters such as IPC, stall reasons, occupancy, and memory bandwidth. Design a rigorous plan to profile and optimize a compute kernel using this simulator.
Provide:
Make minimal, explicit assumptions if necessary to ensure the plan is self-contained.
Login required