COMPUTER ORGANIZATION & ARCHITECHTURE: THE PROCESSOR 2 : PIPELINING

PIPELINING

A pipeline is a set of data processing elements

connected in series, so that the output of one

element is the input of the next one.

Pipeline Logic

---> A clock drives all the registers in the pipeline. This clock causes the

CLC output to be latched in the register which provides input to the next stage, and thus making a start of new computation possible for next stage.

---> The maximum clock rate is decided by the time delay of the CLC in the stage and the delay of the staging latch.

PIPELINING ANALOGY

Pipelined laundry: overlapping execution

◦Parallelism improves performance

FIGURE 1

MIPS PIPELINE : Steps in Executing an Instruction

• Instruction Fetch (IF)
– Fetch the next instruction from memory

• Instruction Decode (ID)
– Examine instruction to determine:
– What operation is performed by the instruction (e.g., addition)
– What operands are required, and where the result goes

• Operand Fetch
– Fetch the operands

• Execution (EX)
– Perform the operation on the operands

• Result Writeback (WB)
– Write the result to the specified location

PIPELINE PERFORMANCE

---> Assume time for stages is

100ps for register read or write
200ps for other stages

---> Compare pipelined datapath with single-cycle datapath

FIGURE 2

PIPELINE PERFORMANCE (CONT...)

FIGURE 3

PIPELINE SPEEDUP 1

--->If all stages are balanced

◦ all take the same time

◦Time between instructions(pipelined)

= Time between instructions(nonpipelined)/ Number of stages

--->If not balanced, speedup is less

Speedup due to increased throughput

◦Latency (time for each instruction) does not decrease

Ideally 5 stage pipeline should offer nearly fivefold improvement over the 800 ps nonpipelined time.

PIPELINE PERFORMANCE

Refer to figure 3, the example does not reflect fourfold improvement for three instructions

◦2400/1400 ≈ 1.7

Add 1,000,000 instructions, each add 200 ps to the total execution time,

◦Total execution time = 1,000,000 x 200 ps + 1400 ps
= 200,001,400 ps

◦Nonpipelined total execution time
= 1,000,000 x 800ps + 2400ps
= 800,002,400 ps

◦Speedup = 800,002,400/200,001,400

≈ 4

PIPELINE SPEEDUP 2

FIGURE 4

Critical Path for Different Instructions

HAZARD

---> Situations that prevent starting the next instruction in the next cycle

---> 3 type of hazard are:

Structure hazards

◦A required resource is busy

Data hazard

◦Need to wait for previous instruction to complete its data read/write

Control hazard

◦Deciding on control action depends on previous instruction

STRUCTURE HAZARD

--->Conflict for use of a resource

--->In MIPS pipeline with a single memory

◦Load/store (pipelined) requires data access at the same time
◦Instruction fetch would have to stall for that cycle

Would cause a pipeline “bubble”

--->Solution, pipelined datapaths require separate instruction/data memories

◦Or separate instruction/data caches

DATA HAZARD

An instruction depends on completion of data access by a previous instruction

add $s0, $t0, $t1
sub $t2, $s0, $t3

FIGURE 5

SOLUTION : FORWARDING (aka Bypassing)

--->Use result when it is computed

◦Don’t wait for it to be stored in a register

◦Requires extra connections in the datapath

FIGURE 6

LOAD-USE DATA HAZARD

FIGURE 7

CODE SCHEDULING TO AVOID STALLS

Reorder code to avoid use of load result in the next instruction

C code for A = B + E ; C = B + F ;

FIGURE 8

CONTROL HAZARD

--->Branch determines flow of control

Fetching next instruction depends on branch outcome

Pipeline can’t always fetch correct instruction

Still working on ID stage of branch

--->In MIPS pipeline

Need to compare registers and compute target early in the pipeline

Assume, add hardware to do it in ID stage, test registers, calculate branch address & update PC during 2nd stage of pipeline.

SOLUTION 1 : STALL ON BRANCH

---> Wait until branch outcome determined before fetching next instruction

FIGURE 9

Figure 9 shows a pipeline showing stalling on every conditional branch as solution to control hazards.

This example assumes the conditional branch is taken, and the instruction at the destination of the branch is the OR instruction.

SOLUTION 2 : BRANCH ON PREDICTION

---> Longer pipelines can’t readily determine branch outcome early

Stall penalty becomes unacceptable

--->Predict outcome of branch

Only stall if prediction is wrong

--->In MIPS pipeline

Can predict branches not taken

When you are right, fetch instruction after branch, with no delay

MORE - REALISTIC BRANCH PREDICTION

--->Static branch prediction

Based on typical branch behavior

Example: loop and if-statement branches

Predict backward branches taken

Predict forward branches not taken

--->Dynamic branch prediction

Hardware measures actual branch behavior

e.g., record recent history of each branch (as taken or untaken branch)

Assume future behavior (predict from past behavior) will continue the trend

When wrong, stall while re-fetching, and update history

BY AINI KHAIRANI BT AZMI

Tuesday, December 10, 2013

THE PROCESSOR 2 : PIPELINING

No comments:

Post a Comment