Add ex2 text. Remove unused file.

This commit is contained in:
peteraa 2019-09-05 19:18:40 +02:00
parent dd0f1340b5
commit c82013581d
4 changed files with 114 additions and 45 deletions

BIN
Images/MEMstage.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

View file

@ -1,4 +1,4 @@
* Exercise 1 * Exercise description
The task in this exercise is to implement a 5-stage pipelined processor for The task in this exercise is to implement a 5-stage pipelined processor for
the [[./instructions.org][RISCV32I instruction set]]. the [[./instructions.org][RISCV32I instruction set]].
@ -6,7 +6,10 @@
at a time, whereas in exercise 2 your design will handle multiple instructions at a time, whereas in exercise 2 your design will handle multiple instructions
at a time. at a time.
This is done by inserting 4 NOP instructions inbetween each source instruction, This is done by inserting 4 NOP instructions inbetween each source instruction,
enabling us to use the same tests for both exercise 1 and 2. enabling us to use the same tests and harness for both exercise 1 and 2.
Once you are done with exercise 1, you can up the difficulty by setting nopPad
to false and start reading the [[exercise2.org][ex2 guide]].
In the project skeleton files ([[./src/main/scala/][Found here]]) you can see that a lot of code has In the project skeleton files ([[./src/main/scala/][Found here]]) you can see that a lot of code has
already been provided, which can make it difficult to get started. already been provided, which can make it difficult to get started.

109
exercise2.org Normal file
View file

@ -0,0 +1,109 @@
* Exercise 2
Safety wheels are now officially off.
To verify this, set nopPadded to false in Manifest.scala and observe as all hell
breaks loose.
Let's break down what's going wrong and what we can do about it.
** RAW Hazards
Consider the following program:
#+begin_src asm
main:
add x1, x1, x2
add x1, x1, x1
add x1, x1, x2
#+end_src
In your implementation this will give you wrong results since the results
of the first add operation will not be available in the registers before
x1 is fetched for the second and third operation.
Your first task should therefore be to implement a forwarding unit
The forwarding unit is located in the EX stage and is responsible for selecting
the ALU input from three possible sources.
These sources are:
+ The value received from the register bank
+ The ALU result currently in MEM
+ The writeback result
The forwarder prioritizes as follows:
+ If the input register address is not the destination in either MEM or WB, select the
register.
+ If the input register address is the destination register in WB, but not in MEM, select
the writeback signal.
+ If the input register address is the destination register for the operation currently
in MEM, select that operation.
There is a special case you need take into account, namely load operations.
Considering the following program:
#+begin_src asm
main:
lw x1, 0(x2)
add x1, x1, x1
add x1, x1, x1
#+end_src
When the second operation (~add x1, x1, x1~) is in EX, the third clause in the forwarder
is triggered, however the result is not yet ready since fetching memory costs a cycle.
In order to fix this the forwarder must issue a signal that freezes the pipeline.
This is done by issuing a signal to the barrier registers, telling them to _not_ update
their contents, essentially repeating the last instruction.
There are many subtleties to consider here.
For instance: What should happen to the instruction currently
in MEM? If it too is repeated the hazard detector will trigger next cycle, effectively
deadlocking your processor.
What about when the write address and read address are similar in the ID stage?
Designing a forwarder can take very long, or it can be done very quickly, all depending
on how *methodical* you are. My advice is to design the algorithm first, then when you're
satisfied implement it on hardware.
** Control hazards
Consider the following code
#+begin_src asm
main:
beq zero, zero, target
add x1, x1, x1
add x1, x1, x1
j main
target:
sub x1, x2, x2
sub x2, x2, x2
#+end_src
Depending on your design the two add instructions will be fetched before the beq jump happens.
Whenever a branch happens it is necessary to flush the spurious instructions that were fetched
before the branch was noticed.
However, simply waiting until the branch has been decided is not acceptable since that guarantees
lost cycles.
In your first iteration, simply assume branches will not be taken, and if they are taken issue
a warning to the barriers that hold spurious instructions telling them to render them impotent
by setting the control signals to do nothing.
* Putting the pedal to the metal
Once you have a design that RAW and control hazards you're ready to up the ante and add some
improvements to your design.
Some suggestions:
** Branch predictor
Instead of assuming branch not taken, use a branch predictor. There are many different schemes,
but I advice you to stick to a simple one, such as 1-bit or 2-bit.
** Fast branch handling
Certaing branches like BEQ and BNE can be calculated very quickly, wheras size comparison branches
(BGE, BLE and friends) take longer. It is therefore feasible to do these checks in the ID stage.
** Adding a data cache
Unless you have already done the two suggested improvements, do not attempt to create a cache.
The first thing you need to do is to add a latency for memory fetching, if not you will have
nothing to improve upon.
If you still insist, start with the BTreeManyO3.s program and analyze the memory access pattern.
What sort of eviction policy and cache size would you choose for this pattern?
You should try writing additional benchmarking programs, it is important to have something measurable,
and the current programs are not made to test cache performance!!

View file

@ -1,43 +0,0 @@
// package FiveStage
// import chisel3._
// import chisel3.iotesters._
// import org.scalatest.{Matchers, FlatSpec}
// import spire.math.{UInt => Uint}
// import fileUtils._
// import cats.implicits._
// import RISCVutils._
// import RISCVasm._
// import riscala._
// import utilz._
// class AllTests extends FlatSpec with Matchers {
// val results = fileUtils.getAllTests.map{f =>
// val result = TestRunner.runTest(f.getPath, false)
// (f.getName, result)
// }
// makeReport(results)
// }
// /**
// This is for you to run more verbose testing.
// */
// class SelectedTests extends FlatSpec with Matchers {
// val tests = List(
// "matMul.s"
// )
// if(!tests.isEmpty){
// val results = fileUtils.getAllTests.filter(f => tests.contains(f.getName)).map{ f =>
// val result = TestRunner.runTest(f.getPath, true)
// (f.getName, result)
// }
// makeReport(results)
// }
// }