TDT4255/exercise.org
2019-06-07 20:01:21 +02:00

12 KiB

Exercise 1

The task in this exercise is to implement a 5-stage pipelined processor for the RISCV32I instruction set.

For exercise 1 you will build a 5-stage processor which handles one instruction at a time, whereas in exercise 2 your design will handle multiple instructions at a time. This is done by inserting 4 NOP instructions inbetween each source instruction, enabling us to use the same tests for both exercise 1 and 2.

In the project skeleton files (Found here) you can see that a lot of code has already been provided.

Before going further it is useful to get an overview of what is provided out of the box.

Tests

In addition to the skeleton files it's useful to take a look at how the tests work. You will not need to alter anything here other than the test manifest, but some of these settings can be quite useful to alter. The main attraction is the test options. By altering the verbosity settings you may change what is output. The settings are

  • printIfSuccessful Enables logging on tests that succeed
  • printErrors Enables logging of errors. You obviously want this one on, at least on the single test.
  • printParsedProgram Prints the desugared program. Useful when the test asm contains instructions that needs to be expanded or altered. Unsure what "bnez" means? Turn this setting on and see!
  • printVMtrace Enables printing of the VM trace, showing how the ideal machine executes a test
  • printVMfinal Enables printing of the final VM state, showing how the registers look after completion. Useful if you want to see what a program returns.
  • printMergedTrace Enables printing of a merged trace. With this option enabled you get to see how the VM and your processor executed the program side by side. This setting is extremely helpful to track down where your program goes wrong! This option attempts to synchronize the execution traces as best as it can, however once your processor design derails this becomes impossible, leading to rather nonsensical output. Instructions that were only executed by either VM or Your design is colored red or blue. IF YOU ARE COLOR BLIND YOU SHOULD ALTER THE DISPLAY COLORS!
  • nopPadded Set this to false when you're ready to enter the big-boy league
  • breakPoints Not implemented. It's there as a teaser, urging you to implement it so I don't have to.

Getting started

In order to make a correct design in a somewhat expedient fashion you need to be methodical!

This means you should have a good idea of how your processor should work before you start writing code. While chisel is more pleasent to work with than other HDLs the bricoleur approach is not recommended.

My recommended approach is therefore to create an RTL sketch of your processor design. Start with an overall sketch showing all the components, then drill down. In your sketch you will eventually add a box for registers, IMEM and DMEM, which should make it clear how the already finished modules fit into the grander design, making the skeleton-code less mysterious.

Adding numbers

In order to get started designing your processor the following steps guide you to implementing the necessary functionality for adding two integers.

Info is progressively being omitted in the latter steps in order to not bog you down in repeated details. After all brevity is ~the soul of~ wit

Step 0

In order to verify that the project is set up properly, open sbt in your project root by typing ./sbt.sh (or simply sbt if you already use scala). sbt, which stands for scala build tool will provide you with a repl where you can compile and test your code.

The initial run will take quite a while to boot as all the necessary stuff is downloaded.

Step ¼:

In your console, type compile to verify that everything compiles correctly.

Step ½:

In your console, type test to verify that the tests run, and that chisel can correctly build your design. This command will unleash the full battery of tests on you.

Step ¾:

In your console, type testOnly FiveStage.SingleTest to run only the tests that you have defined in the test manifest (currently set to "forward2.s").

As you will first implement addition you should change this to the add immediate test. Luckily you do not have to deal with file paths, simply changing "forward2.s" to "addi.s" suffices.

Ensure that the addi test is run by repeating the testOnly FiveStage.SingleTest command.

Step 1:

In order to execute instructions your processor must be able to fetch them. In /kaholaz/TDT4255/src/commit/4a4717b31291caecc4915f41dd20f17d55f5e3c8/src/test/main/IF.scala you can see that the IMEM module is already set to fetch the current program counter address (line 41), however since the current PC is stuck at 0 it will fetch the same instruction over and over. Rectify this by commenting in // PC := PC + 4.U at line 43. You can now verify that your design fetches new instructions each cycle by running the test as in the previous step.

Step 2:

Next, the instruction must be forwarded to the ID stage, so you will need to add the instruction to the io interface of the IF module as an output signal. In /kaholaz/TDT4255/src/commit/4a4717b31291caecc4915f41dd20f17d55f5e3c8/src/test/main/IF.scala at line 21 you can see how the program counter is already defined as an output. You should do the same with the instruction signal.

Step 3:

As you defined the instruction as an output for your IF module, declare it as an input in your ID module (/kaholaz/TDT4255/src/commit/4a4717b31291caecc4915f41dd20f17d55f5e3c8/src/test/main/ID.scala line 21).

Next you need to ensure that the registers and decoder gets the relevant data from the instruction.

This is made more convenient by the fact that `Instruction` is a class, allowing you to access methods defined on it. Keep in mind that it is only a class at compile and synthesis time, it will be indistinguishable from a regular UIint(32.W) in your finished circuit. The methods can be accessed like this:

// Drive funct6 of myModule with the 26th to 31st bit of instruction
myModule.io.funct6 := io.instruction.funct6

Step 4:

Your IF should now have an instruction as an OUTPUT, and your ID as an INPUT, however they are not connected. This must be done in the CPU class where both the ID and IF are instantiated.

Step 4½:

You should now verify that the correct control signals are produced. Using printf, ensure that:

  • The program counter is increasing in increments of 4
  • The instruction in ID is as expected
  • The decoder output is as expected
  • The correct operands are fetched from the registers

Keep in mind that printf might not always be cycle accurate, the point is to ensure that your processor design at least does something!

Step 5:

You will now have to create the EX stage. Use the structure of the IF and ID modules to guide you here. In your EX stage you should have an ALU, preferrable in its own module a la registers in ID. While the ALU is hugely complex, it's very easy to describle in hardware design languages! Using the same approach as in the decoder should be sufficient:

val ALUopMap = Array(
  ADD    -> (io.op1 + io.op2),
  SUB    -> (io.op1 - io.op2),
  ...
  )

io.aluResult := MuxLookup(0.U(32.W), io.aluOp, ALUopMap)

Step 6:

Your MEM stage does very little when an ADDI instruction is executed, so implementing it should be easy. All you have to do is forward signals

Step 7:

You now need to actually write the result back to your register bank. This should be handled at the CPU level. If you sketched your processor already you probably made sure to keep track of the control signals for the instruction currently in WB, so writing to the correct register address should be easy for you ;)

If you ended up driving the register write address with the instruction from IF you should take a moment to reflect on why that was the wrong choice.

Step 8:

Ensure that the simplest addi test works, and give yourself a pat on the back! You've just found the corner pieces of the puzzle, so filling in the rest is "simply" being methodical.

Delivery

Once you are done simply run the deliver.sh script to get an archive.