TDT4255/exercise.org
2019-06-07 17:43:33 +02:00

5.1 KiB

Exercise 1 & 2

The task in this exercise is to implement a 5-stage pipelined processor for the RISCV32I instruction set.

You will use the skeleton code which comes with a freebies, namely the registers, instruction memory and data memory.

These are contained in the files Registers.scala, Dmem.scala and Imem.scala

Getting started

In order to make a correct design in a somewhat expedient fashion you need to be methodical!

This means you should have a good idea of how your processor should work before you start writing code. While chisel is more pleasent to work with than other HDLs the bricoleur approach is not recommended.

My recommended approach is therefore to create a sketch of your processor design. Start with an overall sketch showing all the components, then drill down. In your sketch you will eventually add a box for registers, IMEM and DMEM, which should make it clear how the already finished modules fit into the grander design, making the skeleton-code less mysterious.

Next, your focus should be to get the simplest possible program to work, a program that simply does a single add operation. Info is progressively being omitted in the later steps, after all brevity is ~the soul of~ wit

Step 0: In order to verify that the project is set up properly, open sbt in your project root by typing ./sbt (or simply sbt if you already use scala). sbt, which stands for scala build tool will provide you with a repl where you can compile and test your code.

The initial run will take quite a while to boot as all the necessary stuff is downloaded.

Step ¼: In your console, type `compile` to verify that everything compiles correctly.

Step ½: In your console, type `test` to verify that the tests run, and that chisel can correctly build your design. This command will unleash the full battery of tests on you.

Step ¾: In your console, type `testOnly FiveStage.SelectedTests` to run only the tests that you have defined in the testConf.scala file. In the skeleton this will run the simple add test only, but you should alter this manifest as you build your processor to run more complex tests as a stopgap between running single tests and the full battery.

Be aware that chisel will make quite a lot of noise during test running. I'm not aware of a good way to get rid of this sadly.

Step 1: In order to do this, your processor must be able to select new instructions, so in your IF.scala you must increment the PC.

Step 2: Next, the instruction must be forwarded to the ID stage, so you will need to add the instruction to the io part of InstructionFetch as an output.

Step 3: Your ID stage must take in an instruction in its io bundle, and decode it. In the skeleton code a decoder has already been instantiated in the InstructionDecode module, but it is given a dummy instruction. Likewise, you must ensure that the register gets the relevant data. This can be done by using the instruction class methods (TopLevelSignals.scala) which lets us access the relevant part of the instruction with the dot operator. For instance:

myModule.io.funct6 := io.instruction.funct6

drives funct6 of `myModule` with the 26th to 31st bit of `instruction`.

Step 4: Your IF should now have an instruction as an OUTPUT, and your ID as an INPUT, however they are not connected. This must be done in the CPU class where both the ID and IF are instantiated.

Step 4½: You should now verify that the correct control signals are produced. Using printf, ensure that:

  • The program counter is increasing in increments of 4
  • The instruction in ID is as expected
  • The decoder output is as expected
  • The correct operands are fetched from the registers

Step 5: You will now have to create the EX stage. Use the structure of the IF and ID modules to guide you here. In your EX stage you should have an ALU, preferrable in its own module a la registers in ID. While the ALU is hugely complex, it's very easy to describle in hardware design languages! Using the same approach as in the decoder should be sufficient:

val ALUopMap = Array(
  ADD    -> (io.op1 + io.op2),
  SUB    -> (io.op1 - io.op2),
  ...
  )
    
io.aluResult := MuxLookup(0.U(32.W), io.aluOp, ALUopMap)

Step 6: Your MEM stage does very little when an ADD instruction is executed, so implementing it should be easy

Step 7: You now need to actually write the result back to your register bank. This should be handled at the CPU level. If you sketched your processor already you probably made sure to keep track of the control signals for the instruction currently in WB, so writing to the correct register address should be easy for you ;)

Step 8: Ensure that the simplest add test works, give yourself a pat on the back, you've just found the corner pieces of the puzzle, so filling in the rest is "simply" being methodical.