Schedule instructions on a VLIW pipeline

Q: Schedule instructions on a VLIW pipeline

This is a System Design interview question from Anthropic for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

VLIW Instruction Scheduling and Software Pipelining

You are given a 4-issue VLIW machine with these functional units and latencies:

2 ALUs (ALU latency = 1 cycle)
1 Load/Store (LS) unit (load latency = 3 cycles; assume stores commit in 1 cycle and have no consumer)
1 Special-Function Unit (SFU) (latency = 2 cycles)

Goal: Schedule a basic block to maximize parallel issue and minimize stalls. Explain hazard handling (data and structural), discuss register pressure, and then describe a software-pipelined schedule for a looped version. Finally, explain how the schedule changes if average memory latency increases by one cycle (loads: 4 cycles).

Assumptions (to make the problem concrete):

Below is the basic block as a dependency DAG (each node lists op type, output register, and inputs). The DAG is typical of a numeric kernel computing two addresses, loading two operands, combining them, applying a special function, and storing the result. All registers are virtual (renaming allowed); memory addresses are independent except where stated.

Nodes (operation → latency → dependencies):

A1: r5 = r1 + r2 [ALU→1] deps: —
A2: r6 = r3 - r4 [ALU→1] deps: —
L1: r7 = LD [r5] [LS→3] deps: A1
L2: r8 = LD [r6 + 8] [LS→3] deps: A2
A3: r9 = r7 + r8 [ALU→1] deps: L1, L2
S1: r10 = SFU(r9) [SF→2] deps: A3
A4: r11 = r10 * r12 [ALU→1] deps: S1
ST1: ST [r13] = r11 [LS→1] deps: A4
A5: r14 = r15 + r16 [ALU→1] deps: — (independent work to hide latency)
S2: r17 = SFU(r14) [SF→2] deps: A5 (independent special op)

Constraints per cycle (issue width): at most 2 ALUs, 1 LS, and 1 SFU can be issued in the same cycle.

Tasks:

Produce a legal cycle-by-cycle instruction schedule for the basic block that maximizes parallel issue and minimizes stalls.
Explain how you handled hazards (RAW/WAR/WAW, functional-unit conflicts) and how you managed register pressure.
Suppose this basic block is the loop body. Propose a software-pipelined (modulo) schedule for the steady-state kernel; explain your chosen initiation interval (II) and provide a kernel mapping.
Re-do the analysis when average memory latency increases by one cycle (load latency = 4). Explain the changes to the basic-block schedule and to the loop kernel (including any change in II).

Schedule instructions on a VLIW pipeline

VLIW Instruction Scheduling and Software Pipelining

Solution (Locked)

Comments (0)