327 lines
16 KiB
Org Mode
327 lines
16 KiB
Org Mode
* Exercise description
|
|
The task in this exercise is to implement a 5-stage pipelined processor for
|
|
the [[./instructions.org][RISCV32I instruction set]].
|
|
|
|
For exercise 1 you will build a 5-stage processor which handles one instruction
|
|
at a time, whereas in exercise 2 your design will handle multiple instructions
|
|
at a time.
|
|
This is done by inserting 4 NOP instructions inbetween each source instruction,
|
|
enabling us to use the same tests and harness for both exercise 1 and 2.
|
|
|
|
Once you are done with exercise 1, you can up the difficulty by setting nopPad
|
|
to false and start reading the [[exercise2.org][ex2 guide]].
|
|
|
|
In the project skeleton files ([[./src/main/scala/][Found here]]) you can see that a lot of code has
|
|
already been provided, which can make it difficult to get started.
|
|
Hopefully this document can help clear up at least some of the confusion.
|
|
First an overview of what you are designing is presented, followed by a walk-through
|
|
for getting the most basic instructions to work.
|
|
|
|
In order to orient yourself you first need a map, thus a high level overview of the
|
|
processor you're going to design is showed underneath:
|
|
Keep in mind that this is just a high level sketch, omitting many details as well
|
|
entire features (for instance branch logic)
|
|
|
|
#+CAPTION: A very high level processor schematic. Registers, Instruction and data memory are already implemented.
|
|
[[./Images/FiveStage.png]]
|
|
|
|
Now that you have an idea of what you're building it is time to take inventory of
|
|
the files included in the skeleton, and what, if anything should be added.
|
|
|
|
+ [[./src/main/scala/Tile.scala]]
|
|
This is the top level module for the system as a whole. This is where the test
|
|
harness accessses your design, providing the necessary IO.
|
|
*You should not modify this module for other purposes than debugging.*
|
|
|
|
+ [[./src/main/scala/CPU.scala]]
|
|
This is the top level module for your processor.
|
|
In this module the various stages and barriers that make up your processor
|
|
should be declared and wired together.
|
|
Some of these modules have already been declared in order to wire up the
|
|
debugging logic for your test harness.
|
|
This file corresponds to the high-level overview in its entirety.
|
|
*This module is intended to be further fleshed out by you.*
|
|
As you work with this module, try keeping logic to a minimum to help readability.
|
|
If you end up with a lot of signal select logic, consider moving that to a separate
|
|
module.
|
|
|
|
+ [[./src/main/scala/IF.scala]]
|
|
This is the instruction fetch stage.
|
|
In this stage instruction fetching should happen, meaning you will have to
|
|
add logic for handling branches, jumps, and for exercise 2, stalls.
|
|
The reason this module is already included is that it contains the instruction
|
|
memory, described next which is heavily coupled to the testing harness.
|
|
*This module is intended to be further fleshed out by you.*
|
|
|
|
+ [[./src/main/scala/IMem.scala]]
|
|
This module contains the instruction memory for your processor.
|
|
Upon testing the test harness loads your program into the instruction memory,
|
|
freeing you from the hassle.
|
|
*You should not modify this module for other purposes than maaaaybe debugging.*
|
|
|
|
+ [[./src/main/scala/ID.scala]]
|
|
The instruction decode stage.
|
|
The reason this module is included is that the registers reside here, thus
|
|
for the test harness to work it must be wired up to the register unit to
|
|
record its state updates.
|
|
*This module is intended to be further fleshed out by you.*
|
|
|
|
+ [[./src/main/scala/Registers.scala]]
|
|
Contains the registers for your processor. Note that the zero register is alredy
|
|
disabled, you do not need to do this yourself.
|
|
The test harness ensures that all register updates are recorded.
|
|
*You should not modify this module for other purposes than maaaaybe debugging.*
|
|
|
|
+ [[./src/main/scala/MEM.scala]]
|
|
Like ID and IF, the MEM skeleton module is included so that the test harness
|
|
can set up and monitor the data memory
|
|
*This module is intended to be further fleshed out by you.*
|
|
|
|
+ [[./src/main/scala/DMem.scala]]
|
|
Like the registers and Imem, the DMem is already implemented.
|
|
*You should not modify this module for other purposes than maaaaybe debugging.*
|
|
|
|
+ [[./src/main/scala/Const.scala]]
|
|
Contains helpful constants for decoding, used by the decoder which is provided.
|
|
*This module may be fleshed out further by you if you so choose.*
|
|
|
|
+ [[./src/main/scala/Decoder.scala]]
|
|
The decoder shows how to conveniently demux the instruction.
|
|
In the provided ID.scala file a decoder module has already been instantiated.
|
|
You should flesh it out further.
|
|
You may find it useful to alter this module, especially in exercise 2.
|
|
*This module should be further fleshed out by you.*
|
|
|
|
+ [[./src/main/scala/ToplevelSignals.scala]]
|
|
Contains helpful constants.
|
|
You should add your own constants here when you find the need for them.
|
|
You are not required to use it at all, but it is very helpful.
|
|
*This module can be further fleshed out by you.*
|
|
|
|
+ [[./src/main/scala/SetupSignals.scala]]
|
|
You should obviously not modify this file.
|
|
You may choose to create a similar file for debug signals, modeled on how
|
|
the test harness is built.
|
|
*You should not modify this module at all.*
|
|
|
|
|
|
** Tests
|
|
In addition to the skeleton files it's useful to take a look at how the tests work.
|
|
You will not need to alter anything here other than the [[./src/test/scala/Manifest.scala][test manifest]], but some
|
|
of these settings can be quite useful to alter.
|
|
The main attraction is the test options. By altering the verbosity settings you
|
|
may change what is output.
|
|
The settings are:
|
|
|
|
+ printIfSuccessful
|
|
Enables logging on tests that succeed.
|
|
You typically want this turned off, at least for the full test runner.
|
|
|
|
+ printErrors
|
|
Enables logging of errors. You obviously want this one on, at least on the single
|
|
test.
|
|
|
|
+ printParsedProgram
|
|
Prints the desugared program. Useful when the test asm contains instructions that
|
|
needs to be expanded or altered.
|
|
Unsure what "bnez" means? Turn this setting on and see!
|
|
|
|
+ printVMtrace
|
|
Enables printing of the VM trace, showing how the ideal machine executes a test
|
|
|
|
+ printVMfinal
|
|
Enables printing of the final VM state, showing how the registers look after
|
|
completion. Useful if you want to see what a program returns.
|
|
|
|
+ printMergedTrace
|
|
Enables printing of a merged trace. With this option enabled you get to see how
|
|
the VM and your processor executed the program side by side.
|
|
This setting is extremely helpful to track down where your program goes wrong!
|
|
This option attempts to synchronize the execution traces as best as it can, however
|
|
once your processor design derails this becomes impossible, leading to rather
|
|
nonsensical output.
|
|
Instructions that were only executed by either VM or Your design is colored red or
|
|
blue.
|
|
|
|
*IF YOU ARE COLOR BLIND YOU SHOULD ALTER THE DISPLAY COLORS!*
|
|
|
|
+ nopPadded
|
|
Set this to false when you're ready to enter the big-boy league
|
|
|
|
+ breakPoints
|
|
Not implemented. It's there as a teaser, urging you to implement it so I don't have to.
|
|
|
|
|
|
** Getting started
|
|
In order to make a correct design in a somewhat expedient fashion you need to be
|
|
*methodical!*
|
|
|
|
This means you should have a good idea of how your processor should work *before*
|
|
you start writing code. While chisel is more pleasent to work with than other HDLs
|
|
the [[https://i.imgur.com/6IpVNA7.jpg][bricoleur]] approach is not recommended.
|
|
|
|
My recommended approach is therefore to create an RTL sketch of your processor design.
|
|
Start with an overall sketch showing all the components, then drill down.
|
|
In your sketch you will eventually add a box for registers, IMEM and DMEM, which
|
|
should make it clear how the already finished modules fit into the grander design,
|
|
making the skeleton-code less mysterious.
|
|
|
|
To give you an idea of how a drill down looks like, here is my sketch of the ID stage:
|
|
#+CAPTION: Instruction decode stage, showing the various signals.
|
|
[[./Images/IDstage.png]]
|
|
|
|
I would generally advice to do these on paper, but don't half-ass them.
|
|
|
|
|
|
** Adding numbers
|
|
In order to get started designing your processor the following steps guide you to
|
|
implementing the necessary functionality for adding two integers.
|
|
|
|
Info is progressively being omitted in the latter steps in order to not bog you down
|
|
in repeated details. After all brevity is ~~the soul of~~ wit
|
|
|
|
*** Step 0
|
|
In order to verify that the project is set up properly, open sbt in your project root
|
|
by typing ~./sbt.sh~ (or simply sbt if you already use scala).
|
|
sbt, which stands for scala build tool will provide you with a repl where you can
|
|
compile and test your code.
|
|
|
|
The initial run will take quite a while to boot as all the necessary stuff is downloaded.
|
|
|
|
**** Step ¼:
|
|
In your console, type ~compile~ to verify that everything compiles correctly.
|
|
|
|
**** Step ½:
|
|
In your console, type ~test~ to verify that the tests run, and that chisel can correctly
|
|
build your design.
|
|
This command will unleash the full battery of tests on you.
|
|
|
|
**** Step ¾:
|
|
In your console, type ~testOnly FiveStage.SingleTest~ to run only the tests that you
|
|
have defined in the [[./src/test/scala/Manifest.scala][test manifest]] (currently set to ~forward2.s~).
|
|
|
|
As you will first implement addition you should change this to the [[./src/test/resources/tests/basic/immediate/addi.s][add immediate test]].
|
|
Luckily you do not have to deal with file paths, simply changing ~forward2.s~ to
|
|
~addi.s~ suffices.
|
|
|
|
Ensure that the addi test is run by repeating the ~testOnly FiveStage.SingleTest~
|
|
command.
|
|
|
|
*** Step 1:
|
|
In order to execute instructions your processor must be able to fetch them.
|
|
In [[./src/test/main/IF.scala]] you can see that the IMEM module is already set to fetch
|
|
the current program counter address (line 41), however since the current PC is stuck
|
|
at 0 it will fetch the same instruction over and over. Rectify this by commenting in
|
|
~// PC := PC + 4.U~ at line 48.
|
|
You can now verify that your design fetches new instructions each cycle by running
|
|
the test as in the previous step.
|
|
|
|
*** Step 2:
|
|
Next, the instruction must be forwarded to the ID stage, so you will need to add the
|
|
instruction to the io interface of the IF module as an output signal.
|
|
In [[./src/test/main/IF.scala]] at line 21 you can see how the program counter is already
|
|
defined as an output.
|
|
You should do the same with the instruction signal.
|
|
|
|
|
|
*** Step 3:
|
|
As you defined the instruction as an output for your IF module, declare it as an input
|
|
in your ID module ([[./src/test/main/ID.scala]] line 21).
|
|
|
|
Next you need to ensure that the registers and decoder gets the relevant data from the
|
|
instruction.
|
|
|
|
This is made more convenient by the fact that ~Instruction~ is a class, allowing you
|
|
to access methods defined on it.
|
|
Keep in mind that it is only a class during compile and build time, it will be
|
|
indistinguishable from a regular ~UInt(32.W)~ in your finished circuit.
|
|
The methods can be accessed like this:
|
|
#+BEGIN_SRC scala
|
|
// Drive funct6 of myModule with the 26th to 31st bit of instruction
|
|
myModule.io.funct6 := io.instruction.funct6
|
|
#+END_SRC
|
|
|
|
*** Step 4:
|
|
Your IF should now have an instruction as an OUTPUT, and your ID as an INPUT, however
|
|
they are not connected. This must be done in the CPU class where both the ID and IF are
|
|
instantiated.
|
|
In the overview sketch you probably noticed the barriers between IF and ID.
|
|
In accordance with the overview, it is incorrect to directly connect the two modules,
|
|
instead you must connect them using a *barrier*.
|
|
A barrier is responsible for keeping a value inbetween cycles, facilitating pipelining.
|
|
There is however one complicating matter: It takes a cycle to get the instruction from the
|
|
instruction memory, thus we don't want to delay it in the barrier!
|
|
|
|
In order to make code readable I suggest adding a new file for your barriers, containing
|
|
four different modules for the barriers your design will need.
|
|
|
|
Start with implementing your IF barrier module, which should contain the following:
|
|
+ An input and output for PC where the output is delayed by a single cycle.
|
|
+ An input and output for instruction where the output is wired directly to the input with
|
|
no delay.
|
|
|
|
The sketch for your barrier looks like this
|
|
#+CAPTION: The barrier between IF and ID. Note the passthrough for the instruction
|
|
[[./Images/IFID.png]]
|
|
|
|
**** Step 4½:
|
|
You can now verify that the correct control signals are produced. Using printf, ensure
|
|
that:
|
|
+ The program counter is increasing in increments of 4
|
|
+ The instruction in ID is as expected
|
|
+ The decoder output is as expected
|
|
+ The correct operands are fetched from the registers
|
|
|
|
Keep in mind that printf might not always be cycle accurate, the point is to ensure that
|
|
your processor design at least does something! In general it is better to use debug signals
|
|
and println, but for quick and dirty debugging printf is passable.
|
|
|
|
*** Step 5:
|
|
You will now have to create the EX stage. Use the structure of the IF and ID modules to
|
|
guide you here.
|
|
In your EX stage you should have an ALU, preferrable in its own module a la registers in ID.
|
|
While the ALU is hugely complex, it's very easy to describle in hardware design languages!
|
|
Using the same approach as in the decoder should be sufficient:
|
|
|
|
#+BEGIN_SRC scala
|
|
val ALUopMap = Array(
|
|
ADD -> (io.op1 + io.op2),
|
|
SUB -> (io.op1 - io.op2),
|
|
...
|
|
)
|
|
|
|
// MuxLookup API: https://github.com/freechipsproject/chisel3/wiki/Muxes-and-Input-Selection#muxlookup
|
|
io.aluResult := MuxLookup(io.aluOp, 0.U(32.W), ALUopMap)
|
|
#+END_SRC
|
|
|
|
As with the ID stage, you will need a barrier between ID and EX stage.
|
|
In this case, as the overview sketch indicates, all values should be delayed one cycle.
|
|
|
|
When you have finished the barrier, instantiate it and wire ID and EX together with the barrier in the
|
|
same fashion as IF and ID.
|
|
You don't need to add every single signal for your barrier, rather you should add them as they
|
|
become needed.
|
|
|
|
*** Step 6:
|
|
Your MEM stage does very little when an ADDI instruction is executed, so implementing it should
|
|
be easy. All you have to do is forward signals.
|
|
|
|
From the overview sketch you can see that the same trick used in the IF/ID barrier is utilized
|
|
here, bypassing the data memory read value since it is already delayed by a cycle.
|
|
|
|
*** Step 7:
|
|
You now need to actually write the result back to your register bank.
|
|
This should be handled at the CPU level.
|
|
If you sketched your processor already you probably made sure to keep track of the control
|
|
signals for the instruction currently in WB, so writing to the correct register address should
|
|
be easy for you ;)
|
|
|
|
If you ended up driving the register write address with the instruction from IF you should take
|
|
a moment to reflect on why that was the wrong choice.
|
|
|
|
*** Step 8:
|
|
Ensure that the simplest addi test works, and give yourself a pat on the back!
|
|
You've just found the corner pieces of the puzzle, so filling in the rest is "simply" being methodical.
|
|
|
|
* Delivery
|
|
Once you are done simply run the deliver.sh script to get an archive.
|