Add ex2 text. Remove unused file.
This commit is contained in:
parent
dd0f1340b5
commit
c82013581d
4 changed files with 114 additions and 45 deletions
BIN
Images/MEMstage.png
Normal file
BIN
Images/MEMstage.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 24 KiB |
|
@ -1,4 +1,4 @@
|
||||||
* Exercise 1
|
* Exercise description
|
||||||
The task in this exercise is to implement a 5-stage pipelined processor for
|
The task in this exercise is to implement a 5-stage pipelined processor for
|
||||||
the [[./instructions.org][RISCV32I instruction set]].
|
the [[./instructions.org][RISCV32I instruction set]].
|
||||||
|
|
||||||
|
@ -6,7 +6,10 @@
|
||||||
at a time, whereas in exercise 2 your design will handle multiple instructions
|
at a time, whereas in exercise 2 your design will handle multiple instructions
|
||||||
at a time.
|
at a time.
|
||||||
This is done by inserting 4 NOP instructions inbetween each source instruction,
|
This is done by inserting 4 NOP instructions inbetween each source instruction,
|
||||||
enabling us to use the same tests for both exercise 1 and 2.
|
enabling us to use the same tests and harness for both exercise 1 and 2.
|
||||||
|
|
||||||
|
Once you are done with exercise 1, you can up the difficulty by setting nopPad
|
||||||
|
to false and start reading the [[exercise2.org][ex2 guide]].
|
||||||
|
|
||||||
In the project skeleton files ([[./src/main/scala/][Found here]]) you can see that a lot of code has
|
In the project skeleton files ([[./src/main/scala/][Found here]]) you can see that a lot of code has
|
||||||
already been provided, which can make it difficult to get started.
|
already been provided, which can make it difficult to get started.
|
||||||
|
|
109
exercise2.org
Normal file
109
exercise2.org
Normal file
|
@ -0,0 +1,109 @@
|
||||||
|
* Exercise 2
|
||||||
|
Safety wheels are now officially off.
|
||||||
|
To verify this, set nopPadded to false in Manifest.scala and observe as all hell
|
||||||
|
breaks loose.
|
||||||
|
|
||||||
|
Let's break down what's going wrong and what we can do about it.
|
||||||
|
|
||||||
|
** RAW Hazards
|
||||||
|
Consider the following program:
|
||||||
|
#+begin_src asm
|
||||||
|
main:
|
||||||
|
add x1, x1, x2
|
||||||
|
add x1, x1, x1
|
||||||
|
add x1, x1, x2
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
In your implementation this will give you wrong results since the results
|
||||||
|
of the first add operation will not be available in the registers before
|
||||||
|
x1 is fetched for the second and third operation.
|
||||||
|
|
||||||
|
Your first task should therefore be to implement a forwarding unit
|
||||||
|
|
||||||
|
The forwarding unit is located in the EX stage and is responsible for selecting
|
||||||
|
the ALU input from three possible sources.
|
||||||
|
These sources are:
|
||||||
|
+ The value received from the register bank
|
||||||
|
+ The ALU result currently in MEM
|
||||||
|
+ The writeback result
|
||||||
|
|
||||||
|
The forwarder prioritizes as follows:
|
||||||
|
+ If the input register address is not the destination in either MEM or WB, select the
|
||||||
|
register.
|
||||||
|
+ If the input register address is the destination register in WB, but not in MEM, select
|
||||||
|
the writeback signal.
|
||||||
|
+ If the input register address is the destination register for the operation currently
|
||||||
|
in MEM, select that operation.
|
||||||
|
|
||||||
|
There is a special case you need take into account, namely load operations.
|
||||||
|
Considering the following program:
|
||||||
|
#+begin_src asm
|
||||||
|
main:
|
||||||
|
lw x1, 0(x2)
|
||||||
|
add x1, x1, x1
|
||||||
|
add x1, x1, x1
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
When the second operation (~add x1, x1, x1~) is in EX, the third clause in the forwarder
|
||||||
|
is triggered, however the result is not yet ready since fetching memory costs a cycle.
|
||||||
|
In order to fix this the forwarder must issue a signal that freezes the pipeline.
|
||||||
|
This is done by issuing a signal to the barrier registers, telling them to _not_ update
|
||||||
|
their contents, essentially repeating the last instruction.
|
||||||
|
|
||||||
|
There are many subtleties to consider here.
|
||||||
|
For instance: What should happen to the instruction currently
|
||||||
|
in MEM? If it too is repeated the hazard detector will trigger next cycle, effectively
|
||||||
|
deadlocking your processor.
|
||||||
|
What about when the write address and read address are similar in the ID stage?
|
||||||
|
|
||||||
|
Designing a forwarder can take very long, or it can be done very quickly, all depending
|
||||||
|
on how *methodical* you are. My advice is to design the algorithm first, then when you're
|
||||||
|
satisfied implement it on hardware.
|
||||||
|
|
||||||
|
|
||||||
|
** Control hazards
|
||||||
|
|
||||||
|
Consider the following code
|
||||||
|
|
||||||
|
#+begin_src asm
|
||||||
|
main:
|
||||||
|
beq zero, zero, target
|
||||||
|
add x1, x1, x1
|
||||||
|
add x1, x1, x1
|
||||||
|
j main
|
||||||
|
target:
|
||||||
|
sub x1, x2, x2
|
||||||
|
sub x2, x2, x2
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Depending on your design the two add instructions will be fetched before the beq jump happens.
|
||||||
|
Whenever a branch happens it is necessary to flush the spurious instructions that were fetched
|
||||||
|
before the branch was noticed.
|
||||||
|
However, simply waiting until the branch has been decided is not acceptable since that guarantees
|
||||||
|
lost cycles.
|
||||||
|
In your first iteration, simply assume branches will not be taken, and if they are taken issue
|
||||||
|
a warning to the barriers that hold spurious instructions telling them to render them impotent
|
||||||
|
by setting the control signals to do nothing.
|
||||||
|
|
||||||
|
* Putting the pedal to the metal
|
||||||
|
Once you have a design that RAW and control hazards you're ready to up the ante and add some
|
||||||
|
improvements to your design.
|
||||||
|
Some suggestions:
|
||||||
|
|
||||||
|
** Branch predictor
|
||||||
|
Instead of assuming branch not taken, use a branch predictor. There are many different schemes,
|
||||||
|
but I advice you to stick to a simple one, such as 1-bit or 2-bit.
|
||||||
|
|
||||||
|
** Fast branch handling
|
||||||
|
Certaing branches like BEQ and BNE can be calculated very quickly, wheras size comparison branches
|
||||||
|
(BGE, BLE and friends) take longer. It is therefore feasible to do these checks in the ID stage.
|
||||||
|
|
||||||
|
** Adding a data cache
|
||||||
|
Unless you have already done the two suggested improvements, do not attempt to create a cache.
|
||||||
|
The first thing you need to do is to add a latency for memory fetching, if not you will have
|
||||||
|
nothing to improve upon.
|
||||||
|
|
||||||
|
If you still insist, start with the BTreeManyO3.s program and analyze the memory access pattern.
|
||||||
|
What sort of eviction policy and cache size would you choose for this pattern?
|
||||||
|
You should try writing additional benchmarking programs, it is important to have something measurable,
|
||||||
|
and the current programs are not made to test cache performance!!
|
|
@ -1,43 +0,0 @@
|
||||||
// package FiveStage
|
|
||||||
// import chisel3._
|
|
||||||
// import chisel3.iotesters._
|
|
||||||
// import org.scalatest.{Matchers, FlatSpec}
|
|
||||||
// import spire.math.{UInt => Uint}
|
|
||||||
// import fileUtils._
|
|
||||||
// import cats.implicits._
|
|
||||||
|
|
||||||
// import RISCVutils._
|
|
||||||
// import RISCVasm._
|
|
||||||
// import riscala._
|
|
||||||
|
|
||||||
// import utilz._
|
|
||||||
|
|
||||||
// class AllTests extends FlatSpec with Matchers {
|
|
||||||
|
|
||||||
// val results = fileUtils.getAllTests.map{f =>
|
|
||||||
// val result = TestRunner.runTest(f.getPath, false)
|
|
||||||
// (f.getName, result)
|
|
||||||
// }
|
|
||||||
|
|
||||||
// makeReport(results)
|
|
||||||
// }
|
|
||||||
|
|
||||||
|
|
||||||
// /**
|
|
||||||
// This is for you to run more verbose testing.
|
|
||||||
// */
|
|
||||||
// class SelectedTests extends FlatSpec with Matchers {
|
|
||||||
|
|
||||||
// val tests = List(
|
|
||||||
// "matMul.s"
|
|
||||||
// )
|
|
||||||
|
|
||||||
// if(!tests.isEmpty){
|
|
||||||
// val results = fileUtils.getAllTests.filter(f => tests.contains(f.getName)).map{ f =>
|
|
||||||
// val result = TestRunner.runTest(f.getPath, true)
|
|
||||||
// (f.getName, result)
|
|
||||||
// }
|
|
||||||
|
|
||||||
// makeReport(results)
|
|
||||||
// }
|
|
||||||
// }
|
|
Loading…
Add table
Add a link
Reference in a new issue