Merge branch 'master' of https://github.com/PeterAaser/TDT4255_EX2

2020-06-29 16:19:56 +02:00 · 2020-06-29 16:19:56 +02:00 · 2e37f0b8d7
commit 2e37f0b8d7
parent 9f47433501 8dc92fb8e1
17 changed files with 478 additions and 260 deletions
--- a/Images/IDE.png
+++ b/Images/IDE.png
--- a/Images/merged.png
+++ b/Images/merged.png
--- a/Images/rasta.png
+++ b/Images/rasta.png
--- a/Images/wave1.png
+++ b/Images/wave1.png
--- a/Images/wave2.png
+++ b/Images/wave2.png
--- a/README.org
+++ b/README.org
@ -4,10 +4,7 @@ This is the coursework for the graded part of the TDT4255 course at NTNU.
 * Instructions
-  #+ATTR_HTML: title="Join the chat at https://gitter.im/RISCV-FiveStage/community"
+  To get started designing your 5-stage RISC-V pipeline you read the [[./introduction.org][introduction]]
  [[https://gitter.im/RISCV-FiveStage/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge][file:https://badges.gitter.im/RISCV-FiveStage/community.svg]]
  To get started with designing your 5-stage RISC-V pipeline you should follow the
  [[./exercise.org][Exercise instructions]]
  If you want an introduction to chisel and hardware design you should do the [[https://github.com/PeterAaser/tdt4255-chisel-intro][Chisel Intro]] 
  exercise first.
--- a/exercise.org
+++ b/exercise.org
@ -1,229 +1,83 @@
-* Exercise description
+* Getting started
-  The task in this exercise is to implement a 5-stage pipelined processor for
+  In order to make a correct design in a somewhat expedient fashion you need to be
-  the [[./instructions.org][RISCV32I instruction set]].
+  *methodical!* 
-  For exercise 1 you will build a 5-stage processor which handles one instruction
+  This means you should have a good idea of how your processor should work *before*
-  at a time, whereas in exercise 2 your design will handle multiple instructions
+  you start writing code. While chisel is more pleasent to work with than other HDLs
-  at a time.
+  the [[https://i.imgur.com/6IpVNA7.jpg][bricoleur]] approach is not recommended.
  This is done by inserting 4 NOP instructions inbetween each source instruction,
  enabling us to use the same tests and harness for both exercise 1 and 2.
-  Once you are done with exercise 1, you can up the difficulty by setting nopPad
+  My recommended approach is therefore to create an RTL sketch of your processor design.
-  to false and start reading the [[exercise2.org][ex2 guide]].
+  Start with an overall sketch showing all the components, then drill down.
-
+  In your sketch you will eventually add a box for registers, IMEM and DMEM, which
-  In the project skeleton files ([[./src/main/scala/][Found here]]) you can see that a lot of code has
+  should make it clear how the already finished modules fit into the grander design,
-  already been provided, which can make it difficult to get started.
+  making the skeleton-code less mysterious.
  Hopefully this document can help clear up at least some of the confusion.
  First an overview of what you are designing is presented, followed by a walk-through
  for getting the most basic instructions to work.
-  In order to orient yourself you first need a map, thus a high level overview of the 
+  To give you *an idea* of how a drill down looks like, here is my sketch of the ID stage:
-  processor you're going to design is showed underneath:
+  Note that this sketch does not contain everything an ID stage needs to contain, this is
-  Keep in mind that this is just a high level sketch, omitting many details as well
+  simply an example. First draft was made on paper.
-  entire features (for instance branch logic)
+  #+CAPTION: Instruction decode stage, showing the various signals.
-
+  #+attr_html: :width 1000px
-  *Important*
+  #+attr_latex: :width 1000px
  When you are done, use the provided ./deliver.sh script to pack up the archive.
  If you're unable to run bash scripts then please ensure that you deliver a *zip* archive.
  Not .rar or anything else, just use zip because my grading script knows how to handle that
  in addition to the one used by deliver.sh
  named after your username. Nothing more, nothing less, just your username.
  This archive should be runnable as is, thus you need to include all the necessary files.
  (I may or may not diff the tests to check if you're screwing with them)
  #+CAPTION: A very high level processor schematic. Registers, Instruction and data memory are already implemented.
  [[./Images/FiveStage.png]]
  Now that you have an idea of what you're building it is time to take inventory of
  the files included in the skeleton, and what, if anything should be added.
  + [[./src/main/scala/Tile.scala]]
    This is the top level module for the system as a whole. This is where the test
    harness accessses your design, providing the necessary IO. 
    *You should not modify this module for other purposes than debugging.*
  + [[./src/main/scala/CPU.scala]]
    This is the top level module for your processor.
    In this module the various stages and barriers that make up your processor
    should be declared and wired together.
    Some of these modules have already been declared in order to wire up the
    debugging logic for your test harness.
    This file corresponds to the high-level overview in its entirety.
    *This module is intended to be further fleshed out by you.*
    As you work with this module, try keeping logic to a minimum to help readability.
    If you end up with a lot of signal select logic, consider moving that to a separate
    module.
  + [[./src/main/scala/IF.scala]]
    This is the instruction fetch stage.
    In this stage instruction fetching should happen, meaning you will have to
    add logic for handling branches, jumps, and for exercise 2, stalls.
    The reason this module is already included is that it contains the instruction
    memory, described next which is heavily coupled to the testing harness.
    *This module is intended to be further fleshed out by you.*
  + [[./src/main/scala/IMem.scala]]
    This module contains the instruction memory for your processor.
    Upon testing the test harness loads your program into the instruction memory,
    freeing you from the hassle.
    *You should not modify this module for other purposes than maaaaybe debugging.*
  + [[./src/main/scala/ID.scala]]
    The instruction decode stage.
    The reason this module is included is that the registers reside here, thus
    for the test harness to work it must be wired up to the register unit to
    record its state updates.
    *This module is intended to be further fleshed out by you.*
  + [[./src/main/scala/Registers.scala]]
    Contains the registers for your processor. Note that the zero register is alredy
    disabled, you do not need to do this yourself.
    The test harness ensures that all register updates are recorded.
    *You should not modify this module for other purposes than maaaaybe debugging.*
  + [[./src/main/scala/MEM.scala]]
    Like ID and IF, the MEM skeleton module is included so that the test harness
    can set up and monitor the data memory
    *This module is intended to be further fleshed out by you.*
  + [[./src/main/scala/DMem.scala]]
    Like the registers and Imem, the DMem is already implemented.
    *You should not modify this module for other purposes than maaaaybe debugging.*
  + [[./src/main/scala/Const.scala]]
    Contains helpful constants for decoding, used by the decoder which is provided.
    *This module may be fleshed out further by you if you so choose.*
  + [[./src/main/scala/Decoder.scala]]
    The decoder shows how to conveniently demux the instruction.
    In the provided ID.scala file a decoder module has already been instantiated.
    You should flesh it out further.
    You may find it useful to alter this module, especially in exercise 2.
    *This module should be further fleshed out by you.*
  + [[./src/main/scala/ToplevelSignals.scala]]
    Contains helpful constants. 
    You should add your own constants here when you find the need for them.
    You are not required to use it at all, but it is very helpful.
    *This module can be further fleshed out by you.*
  + [[./src/main/scala/SetupSignals.scala]]
    You should obviously not modify this file.
    You may choose to create a similar file for debug signals, modeled on how
    the test harness is built.
    *You should not modify this module at all.*
 ** Tests
   In addition to the skeleton files it's useful to take a look at how the tests work.
   You will not need to alter anything here other than the [[./src/test/scala/Manifest.scala][test manifest]], but some
   of these settings can be quite useful to alter.
   The main attraction is the test options. By altering the verbosity settings you
   may change what is output.
   The settings are:
   + printIfSuccessful
     Enables logging on tests that succeed.
     You typically want this turned off, at least for the full test runner.
   + printErrors
     Enables logging of errors. You obviously want this one on, at least on the single
     test.
   + printParsedProgram
     Prints the desugared program. Useful when the test asm contains instructions that
     needs to be expanded or altered.
     Unsure what "bnez" means? Turn this setting on and see!
   + printVMtrace
     Enables printing of the VM trace, showing how the ideal machine executes a test
   + printVMfinal
     Enables printing of the final VM state, showing how the registers look after
     completion. Useful if you want to see what a program returns.
   + printMergedTrace
     Enables printing of a merged trace. With this option enabled you get to see how
     the VM and your processor executed the program side by side.
     This setting is extremely helpful to track down where your program goes wrong!
     This option attempts to synchronize the execution traces as best as it can, however
     once your processor design derails this becomes impossible, leading to rather
     nonsensical output.
     Instructions that were only executed by either VM or Your design is colored red or
     blue.
     *IF YOU ARE COLOR BLIND YOU SHOULD ALTER THE DISPLAY COLORS!*
   + nopPadded
     Set this to false when you're ready to enter the big-boy league
   + breakPoints
     Not implemented. It's there as a teaser, urging you to implement it so I don't have to.
 ** Getting started
   In order to make a correct design in a somewhat expedient fashion you need to be
   *methodical!* 
   This means you should have a good idea of how your processor should work *before*
   you start writing code. While chisel is more pleasent to work with than other HDLs
   the [[https://i.imgur.com/6IpVNA7.jpg][bricoleur]] approach is not recommended.
   My recommended approach is therefore to create an RTL sketch of your processor design.
   Start with an overall sketch showing all the components, then drill down.
   In your sketch you will eventually add a box for registers, IMEM and DMEM, which
   should make it clear how the already finished modules fit into the grander design,
   making the skeleton-code less mysterious.
   To give you an idea of how a drill down looks like, here is my sketch of the ID stage:
   #+CAPTION: Instruction decode stage, showing the various signals.
   [[./Images/IDstage.png]]
-   I would generally advice to do these on paper, but don't half-ass them.
+  I would generally advice to do these on paper, but don't half-ass them.
  I advise you to use squared paper unless you are exceptionally talented at freehand drawing.
 ** Adding numbers
   In order to get started designing your processor the following steps guide you to
-   implementing the necessary functionality for adding two integers.
+   implementing the necessary functionality for adding two integers by implementing the
   ~ADDI~ operation which you can read about in [[instructions.org][the ISA overview.]]
-   Info is progressively being omitted in the latter steps in order to not bog you down
+   Info will be progressively omitted in the later steps in order to not bog you down
-   in repeated details. After all brevity is ~~the soul of~~ wit
+   in repetitive details. After all brevity is wit.
 *** Step 0
    In order to verify that the project is set up properly, open sbt in your project root
    by typing ~./sbt.sh~ (or simply sbt if you already use scala).
    sbt, which stands for scala build tool will provide you with a repl where you can
-    compile and test your code.
+    compile and test your code. This should be familiar from the chisel introduction exercise.
    If you have not done this I advise you to complete it first, it can be found here: [[https://github.com/PeterAaser/tdt4255-chisel-intro][Chisel Intro]] 
-    The initial run will take quite a while to boot as all the necessary stuff is downloaded.
+    The initial run might take quite a while to finish as all the necessary stuff is downloaded.
 **** Step ¼:
-     In your console, type ~compile~ to verify that everything compiles correctly.
+     In your sbt console, type ~compile~ to verify that everything compiles correctly.
 **** Step ½:
     In your console, type ~test~ to verify that the tests run, and that chisel can correctly
     build your design.
-     This command will unleash the full battery of tests on you.
+     This command will unleash the full battery of tests on you, so be prepared for a lot of
     console outpute.
 **** Step ¾:
-     In your console, type ~testOnly FiveStage.SingleTest~ to run only the tests that you
+     To reduce the amount of tests being run you need to modify the [[./src/test/scala/Manifest.scala][test manifest]].
-     have defined in the [[./src/test/scala/Manifest.scala][test manifest]] (currently set to ~forward2.s~).
+     In the very top of the ~Manifest~ object, change the value of ~singleTest~ from ~"forward2.s"~ 
     to ~"addi.s"~. It is not necessary to deal with filepaths, all tests are globally visible and
     should preferrably not share names even if they are in different directories.
     The full battery of tests can be found in [[./src/test/resources/tests/]] but for now focus on
     [[./src/test/resources/tests/basic/immediate/addi.s]] which as you can see is an assembly file
     consisting only of ~addi~ instructions.
-     As you will first implement addition you should change this to the [[./src/test/resources/tests/basic/immediate/addi.s][add immediate test]].
+     When running the following in the sbt console: ~testOnly FiveStage.SingleTest~
-     Luckily you do not have to deal with file paths, simply changing ~forward2.s~ to
+     only the test pointed at in ~Manifest.singleTest~ will be run, and the log will be more
-     ~addi.s~ suffices.
+     thorough.
-
+     For now the log will be rather confusing since your processor is doing nothing.
-     Ensure that the addi test is run by repeating the ~testOnly FiveStage.SingleTest~
+     As you follow the guide you will see the output log making more sense!
-     command.
+     
     Now that only one test is running it is time to take it from red to green.
-*** Step 1:
+*** Step 1: Starting the clock
    In order to execute instructions your processor must be able to fetch them.
    In [[./src/test/main/IF.scala]] you can see that the IMEM module is already set to fetch
    the current program counter address (line 41), however since the current PC is stuck
    at 0 it will fetch the same instruction over and over. Rectify this by commenting in
    ~// PC := PC + 4.U~ at line 48.
    You can now verify that your design fetches new instructions each cycle by running
-    the test as in the previous step.
+    the test as in the previous step. The log should now be much shorter since the tester
    will be able to synchronize the clock and correctly deduce that the DUT (device under test) 
    is not doing anything.
 *** Step 2:
    Next, the instruction must be forwarded to the ID stage, so you will need to add the
@ -231,17 +85,29 @@
    In [[./src/test/main/IF.scala]] at line 21 you can see how the program counter is already
    defined as an output. 
    You should do the same with the instruction signal.
-
+    
    *Note*
    Even though an instruction is just a 32 bit signal it is very useful to treat it as
    a more refined type.
    In [[./src/main/scala/ToplevelSignals.scala]] you can see the ~Instruction~ class which
    comes with many useful helper methods.
    When defining an output for the instruction you need to define the type as follows:
    ~Output(new Instruction)~
    This is also explained in the comments in the file itself.
 *** Step 3:
    As you defined the instruction as an output for your IF module, declare it as an input
    in your ID module ([[./src/test/main/ID.scala]] line 21).
    This input should be defined nearly identical to step 2, only substituting ~Output~ with 
    ~Input~ like the following: ~Input(new Instruction)~
    Next you need to ensure that the registers and decoder gets the relevant data from the
    instruction.
    This is made more convenient by the fact that ~Instruction~ is a class, allowing you
    to access methods defined on it.
    Your IDE should give you some hints as to what these methods are, but if you want to look 
    at the definition you can check out ~Instruction~ in ~TopLevelSignals.scala~.
    Keep in mind that it is only a class during compile and build time, it will be 
    indistinguishable from a regular ~UInt(32.W)~ in your finished circuit.
    The methods can be accessed like this:
@ -249,20 +115,30 @@
    // Drive funct6 of myModule with the 26th to 31st bit of instruction
    myModule.io.funct6 := io.instruction.funct6
    #+END_SRC
-
+    
 *** Step 4:
    Your IF should now have an instruction as an OUTPUT, and your ID as an INPUT, however
    they are not connected. This must be done in the CPU class where both the ID and IF are
    instantiated.
-    In the overview sketch you probably noticed the barriers between IF and ID.
+    In the overview sketch you probably noticed the *barriers* between IF and ID.
    In accordance with the overview, it is incorrect to directly connect the two modules,
-    instead you must connect them using a *barrier*.
+    instead you must connect them using a barrier.
    A barrier is responsible for keeping a value inbetween cycles, facilitating pipelining.
    There is however one complicating matter: It takes a cycle to get the instruction from the
    instruction memory, thus we don't want to delay it in the barrier!
    It is not very conductive to learning to introduce a rule and then break it right away,
    however it *is* a good idea to highlight the importance of RTL sketches!
    If you look at the ID stage sketch at the top you can see that the Instruction memory block
    is overlapping with the IFID barrier register, reminding you that the instruction should not 
    be stored in the barrier.
    In order to make code readable I suggest adding a new file for your barriers, containing
    four different modules for the barriers your design will need.
    I prefer one file per barrier rather than one large file for all, but you can do as you
    please.
    Start with implementing your IF barrier module, which should contain the following:
    + An input and output for PC where the output is delayed by a single cycle.
@ -272,19 +148,41 @@
    The sketch for your barrier looks like this
    #+CAPTION: The barrier between IF and ID. Note the passthrough for the instruction
    [[./Images/IFID.png]]
-
+    
    *Hints*
    The instruction signal can be wired straight from input to output.
    The PC must be saved in a register. You can use ~RegInit(0.U(32.W))~ to define this register.
    By driving the register with the input PC and the output with the register you will attain
    a one cycle delay.
 **** Step 4½:
-     You can now verify that the correct control signals are produced. Using printf, ensure
+     You can now verify that the correct control signals are produced, either with printf or gtkwave. 
-     that:
+     ensure that:
     + The program counter is increasing in increments of 4
     + The instruction in ID is as expected
     + The decoder output is as expected
     + The correct operands are fetched from the registers
-     Keep in mind that printf might not always be cycle accurate, the point is to ensure that
+     I advise you to use gtkwave first and foremost since it has a learning curve and is very useful.
-     your processor design at least does something! In general it is better to use debug signals
+     Unlike previous exercise the outputs are now located in the waveform directory and is automatically
-     and println, but for quick and dirty debugging printf is passable.
+     produced each time you run a test.
     The following image shows gtkwave output with some formatting showing the desired results:
     [[./Images/wave1.png]]
     As you can see, this isn't very helpful, there's a little too much data, however it does verify that something is going on.
     If you followed the introduction you might have wondered how the bootloader works, which is what you are seeing here.
     While a program is being loaded the setup signal in the testHarness is true (1), thus you should zoom in on what happens as
     soon as the setup signal is set to false, which is when your processor starts working.
     By zooming in on this region you should see something similar (I've set data format to decimal in this image to make the output
     more readable).
     As you can see, the PC signal that ID receives is one cycle delayed compared to IF, whereas the instruction signal is not since
     it is one cycle delayed anyways.
     [[./Images/wave2.png]]
     You should also verify that ~registers~ get the correct signals.
 *** Step 5:
    You will now have to create the EX stage. Use the structure of the IF and ID modules to
    guide you here.
@ -309,14 +207,15 @@
    When you have finished the barrier, instantiate it and wire ID and EX together with the barrier in the 
    same fashion as IF and ID.
    You don't need to add every single signal for your barrier, rather you should add them as they
-    become needed.
+    become needed, i.e if you need a signal in EX, wire it from ID.
 *** Step 6:
    Your MEM stage does very little when an ADDI instruction is executed, so implementing it should 
    be easy. All you have to do is forward signals.
    From the overview sketch you can see that the same trick used in the IF/ID barrier is utilized
-    here, bypassing the data memory read value since it is already delayed by a cycle.
+    here, bypassing the data memory read value since it is already delayed by a cycle, however ~addi~
    does not interact with the data memory so this can be omitted.
 *** Step 7:
    You now need to actually write the result back to your register bank. 
@ -325,9 +224,15 @@
    signals for the instruction currently in WB, so writing to the correct register address should
    be easy for you ;)
-    If you ended up driving the register write address with the instruction from IF you should take
+    Did you just realize that you had been driving ~registers.writeEnable~ and ~registers.writeAddress~
-    a moment to reflect on why that was the wrong choice.
+    with the instruction from the IFID barrier?
-
+    If so the signal is at the correct spot but at the wrong time!
    It is only when the instruction is fully computed that it should be written back, therefore the 
    control signals for register write enable and address are propagated through the pipeline at the 
    same pace as the instruction itself so that they reach the register module when the result is
    ready!
 *** Step 8:
    Ensure that the simplest addi test works, and give yourself a pat on the back!
    You've just found the corner pieces of the puzzle, so filling in the rest is "simply" being methodical.
--- a/instructions.org
+++ b/instructions.org
@ -2,7 +2,7 @@
 4.2. Register-Register Arithmetic Instructions
 --------------------------------------------------------------------------
-These do not render well on github, try using your text editor.
+If these do not render well on github, try using your text editor.
 * ADD
--- a/introduction.org
+++ b/introduction.org
@ -0,0 +1,214 @@
 * About RISCV-FiveStage
  The task in this exercise is to implement a 5-stage pipelined processor for
  the [[./instructions.org][RISCV32I instruction set]].
  This exercise framework is used for the two graded exercises in the processor
  design course TDT4255, however you are more than welcome to use this project
  yourself, or to teach a class. Please reach out if you do!
  If you are doing this as part of the TDT4255 course be sure to join our slack
  group. Slack links only last for a month, so the invite link will likely be
  expired. 
  Here it is anyways, feel free to join even if you're not taking the course at NTNU.
  https://join.slack.com/t/tdt4255-2020/shared_invite/zt-erb9fbnm-NscwZGNsVSTjYPnSCjo1aA
  In this exercise you will build a 5-stage RISCV32I processor that is able to run
  real RISC-V programs as long as they only use the 32I instruction subset.
  Since this is your first time building a processor, starting with a 5-stage design
  presents a very difficult challenge, which is why this exercise is split into two
  parts. In the first part the instructions will be interspersed with NOP instructions,
  four NOPs for every real. This means that you do not need to take into account
  dependencies and so forth, making things a lot easier for you.
  For the second exercise the only difference is that NOP instructions will not be
  inserted. You can read about this in the [[exercise2.org][ex2 guide]], and will not be discussed
  further here.
  In the project skeleton files ([[./src/main/scala/][Found here]]) you can see that a lot of code has
  already been provided, which can make it difficult to get started.
  Hopefully this document can help clear up at least some of the confusion.
  The rest of this document gives an overview of the exercise framework and testing. 
  If you want to jump straight to something practical you can start following the 
  [[exercise.org][exercise guide]], however at some point you should read through the rest of this document.
 ** A tour of FiveStage
   In order to orient yourself you first need a map, thus a high level overview of the 
   processor you're going to design is showed underneath:
   Keep in mind that this is just a high level sketch, omitting many details as well
   entire features (for instance branch logic)
   *Important*
   When you are done, use the provided ./deliver.sh script to pack up the archive.
   If you're unable to run bash scripts then please ensure that you deliver a *zip* archive.
   Not .rar or anything else, just use zip because my grading script knows how to handle that
   in addition to the one used by deliver.sh
   named after your username. Nothing more, nothing less, just your username.
   This archive should be runnable as is, thus you need to include all the necessary files.
   (I may or may not diff the tests to check if you're screwing with them)
   #+CAPTION: A very high level processor schematic. Registers, Instruction and data memory are already implemented.
   #+attr_html: :width 1000px
   #+attr_latex: :width 1000px
   [[./Images/FiveStage.png]]
   Now that you have an idea of what you're building it is time to take inventory of
   the files included in the skeleton, and what, if anything should be added.
   + [[./src/main/scala/Tile.scala]]
     This is the top level module for the system as a whole. This is where the test
     harness accessses your design, providing the necessary IO. 
     *You should not modify this module for other purposes than debugging.*
   + [[./src/main/scala/CPU.scala]]
     This is the top level module for your processor.
     In this module the various stages and barriers that make up your processor
     should be declared and wired together.
     Some of these modules have already been declared in order to wire up the
     debugging logic for your test harness.
     This file corresponds to the high-level overview in its entirety.
     *This module is intended to be further fleshed out by you.*
     As you work with this module, try keeping logic to a minimum to help readability.
     If you end up with a lot of signal select logic, consider moving that to a separate
     module.
   + [[./src/main/scala/IF.scala]]
     This is the instruction fetch stage.
     In this stage instruction fetching should happen, meaning you will have to
     add logic for handling branches, jumps, and for exercise 2, stalls.
     The reason this module is already included is that it contains the instruction
     memory, described next which is heavily coupled to the testing harness.
     *This module is intended to be further fleshed out by you.*
   + [[./src/main/scala/IMem.scala]]
     This module contains the instruction memory for your processor.
     Upon testing the test harness loads your program into the instruction memory,
     freeing you from the hassle.
     *You should not modify this module for other purposes than maaaaybe debugging.*
   + [[./src/main/scala/ID.scala]]
     The instruction decode stage.
     The reason this module is included is that the registers reside here, thus
     for the test harness to work it must be wired up to the register unit to
     record its state updates.
     *This module is intended to be further fleshed out by you.*
   + [[./src/main/scala/Registers.scala]]
     Contains the registers for your processor. Note that the zero register is alredy
     disabled, you do not need to do this yourself.
     The test harness ensures that all register updates are recorded.
     *You should not modify this module for other purposes than maaaaybe debugging.*
   + [[./src/main/scala/MEM.scala]]
     Like ID and IF, the MEM skeleton module is included so that the test harness
     can set up and monitor the data memory
     *This module is intended to be further fleshed out by you.*
   + [[./src/main/scala/DMem.scala]]
     Like the registers and Imem, the DMem is already implemented.
     *You should not modify this module for other purposes than maaaaybe debugging.*
   + [[./src/main/scala/Const.scala]]
     Contains helpful constants for decoding, used by the decoder which is provided.
     *This module may be fleshed out further by you if you so choose.*
   + [[./src/main/scala/Decoder.scala]]
     The decoder shows how to conveniently demux the instruction.
     In the provided ID.scala file a decoder module has already been instantiated.
     You should flesh it out further.
     You may find it useful to alter this module, especially in exercise 2.
     *This module should be further fleshed out by you.*
   + [[./src/main/scala/ToplevelSignals.scala]]
     Contains helpful constants. 
     You should add your own constants here when you find the need for them.
     You are not required to use it at all, but it is very helpful.
     *This module can be further fleshed out by you.*
   + [[./src/main/scala/SetupSignals.scala]]
     You should obviously not modify this file.
     You may choose to create a similar file for debug signals, modeled on how
     the test harness is built.
     *You should not modify this module at all.*
 **  Tests
    In addition to the skeleton files it's useful to take a look at how the tests work.
    You will not need to alter anything here other than the [[./src/test/scala/Manifest.scala][test manifest]], but some
    of these settings can be quite useful to alter.
    The main attraction is the test options. By altering the verbosity settings you
    may change what is output.
    The settings are:
    + printIfSuccessful
      Enables logging on tests that succeed.
      You typically want this turned off, at least for the full test runner.
    + printErrors
      Enables logging of errors. You obviously want this one on, at least on the single
      test.
    + printParsedProgram
      Prints the desugared program. Useful when the test asm contains instructions that
      needs to be expanded or altered.
      Unsure what "bnez" means? Turn this setting on and see!
    + printVMtrace
      Enables printing of the VM trace, showing how the ideal machine executes a test
    + printVMfinal
      Enables printing of the final VM state, showing how the registers look after
      completion. Useful if you want to see what a program returns.
    + printMergedTrace
      Enables printing of a merged trace. With this option enabled you get to see how
      the VM and your processor executed the program side by side.
      This setting is extremely helpful to track down where your program goes wrong!
      This option attempts to synchronize the execution traces as best as it can, however
      once your processor design derails this becomes impossible, leading to rather
      nonsensical output.
      The output should look like this (picture is from exercise 2, without NOP padding)
      #+attr_html: :width 300px
      #+attr_latex: :width 3000px
      [[./Images/merged.png]]
      Instructions that were only executed by either VM or Your design is colored red or
      blue.
      *IF YOU ARE COLOR BLIND YOU SHOULD ALTER THE DISPLAY COLORS!*
      On some windows terminal emulators there exists a bug that causes colors to not display
      correctly, giving your terminal a very.. rastafarian look as shown below:
      #+attr_html: :width 300px
      #+attr_latex: :width 3000px
      [[./Images/rasta.png]]
    + nopPadded
      Set this to false when you're ready to enter the big-boy league
    + breakPoints
      Not implemented. It's there as a teaser, urging you to implement it so I don't have to.
--- a/src/main/scala/Decoder.scala
+++ b/src/main/scala/Decoder.scala
@ -8,7 +8,7 @@ import chisel3.util.ListLookup
  * This module is mostly done, but you will have to fill in the blanks in opcodeMap.
  * You may want to add more signals to be decoded in this module depending on your
  * design if you so desire.
-  * 
+  *
  * In the "classic" 5 stage decoder signals such as op1select and immType
  * are not included, however I have added them to my design, and similarily you might
  * find it useful to add more
@ -36,23 +36,23 @@ class Decoder() extends Module {
  val Y = 1.asUInt(1.W)
  /**
-    * In scala we sometimes (ab)use the `->` operator to create tuples. 
+    * In scala we sometimes (ab)use the `->` operator to create tuples.
    * The reason for this is that it serves as convenient sugar to make maps.
-    * 
+    *
    * This doesn't matter to you, just fill in the blanks in the style currently
    * used, I just want to demystify some of the scala magic.
-    * 
+    *
    * `a -> b` == `(a, b)` == `Tuple2(a, b)`
    */
  val opcodeMap: Array[(BitPat, List[UInt])] = Array(
-    // signal      memToReg, regWrite, memRead, memWrite, branch,  jump, branchType,    Op1Select, Op2Select, ImmSelect,    ALUOp
+    // signal      regWrite, memRead, memWrite, branch,  jump, branchType,    Op1Select, Op2Select, ImmSelect,    ALUOp
-    LW     -> List(Y,        Y,        Y,       N,        N,       N,    branchType.DC, rs1,       imm,       ITYPE,        ALUOps.ADD),
+    LW     -> List(Y,        Y,       N,        N,       N,    branchType.DC, rs1,       imm,       ITYPE,        ALUOps.ADD),
-    SW     -> List(N,        N,        N,       Y,        N,       N,    branchType.DC, rs1,       imm,       STYPE,        ALUOps.ADD),
+    SW     -> List(N,        N,       Y,        N,       N,    branchType.DC, rs1,       imm,       STYPE,        ALUOps.ADD),
-    ADD    -> List(N,        Y,        N,       N,        N,       N,    branchType.DC, rs1,       rs2,       ImmFormat.DC, ALUOps.ADD),
+    ADD    -> List(Y,        N,       N,        N,       N,    branchType.DC, rs1,       rs2,       ImmFormat.DC, ALUOps.ADD),
-    SUB    -> List(N,        Y,        N,       N,        N,       N,    branchType.DC, rs1,       rs2,       ImmFormat.DC, ALUOps.SUB),
+    SUB    -> List(Y,        N,       N,        N,       N,    branchType.DC, rs1,       rs2,       ImmFormat.DC, ALUOps.SUB),
    /**
      TODO: Fill in the blanks
@ -60,23 +60,22 @@ class Decoder() extends Module {
    )
-  val NOP = List(N, N, N, N, N, N, branchType.DC, rs1, rs2, ImmFormat.DC, ALUOps.DC)
+  val NOP = List(N, N, N, N, N, branchType.DC, rs1, rs2, ImmFormat.DC, ALUOps.DC)
  val decodedControlSignals = ListLookup(
    io.instruction.asUInt(),
    NOP,
    opcodeMap)
-  io.controlSignals.memToReg   := decodedControlSignals(0)
+  io.controlSignals.regWrite   := decodedControlSignals(0)
-  io.controlSignals.regWrite   := decodedControlSignals(1)
+  io.controlSignals.memRead    := decodedControlSignals(1)
-  io.controlSignals.memRead    := decodedControlSignals(2)
+  io.controlSignals.memWrite   := decodedControlSignals(2)
-  io.controlSignals.memWrite   := decodedControlSignals(3)
+  io.controlSignals.branch     := decodedControlSignals(3)
-  io.controlSignals.branch     := decodedControlSignals(4)
+  io.controlSignals.jump       := decodedControlSignals(4)
  io.controlSignals.jump       := decodedControlSignals(5)
-  io.branchType := decodedControlSignals(6)
+  io.branchType := decodedControlSignals(5)
-  io.op1Select  := decodedControlSignals(7)
+  io.op1Select  := decodedControlSignals(6)
-  io.op2Select  := decodedControlSignals(8)
+  io.op2Select  := decodedControlSignals(7)
-  io.immType    := decodedControlSignals(9)
+  io.immType    := decodedControlSignals(8)
-  io.ALUop      := decodedControlSignals(10)
+  io.ALUop      := decodedControlSignals(9)
 }
--- a/src/main/scala/ToplevelSignals.scala
+++ b/src/main/scala/ToplevelSignals.scala
@ -39,7 +39,6 @@ object Instruction {
 class ControlSignals extends Bundle(){
  val memToReg   = Bool()
  val regWrite   = Bool()
  val memRead    = Bool()
  val memWrite   = Bool()
@ -51,7 +50,6 @@ class ControlSignals extends Bundle(){
 object ControlSignals {
  def nop: ControlSignals = {
    val b = Wire(new ControlSignals)
    b.memToReg   := false.B
    b.regWrite   := false.B
    b.memRead    := false.B
    b.memWrite   := false.B
--- a/src/test/scala/Manifest.scala
+++ b/src/test/scala/Manifest.scala
@ -19,7 +19,7 @@ import LogParser._
 object Manifest {
-  val singleTest = "forward2.s"
+  val singleTest = "addi.s"
  val nopPadded = false
@ -96,3 +96,30 @@ class AllTests extends FlatSpec with Matchers {
    }
  }
 }
 /**
  * Not tested at all
  */
 class AllTestsWindows extends FlatSpec with Matchers {
  it should "just werk" in {
    val werks = getAllWindowsTestNames.filterNot(_ == "convolution.s").map{testname => 
      say(s"testing $testname")
      val opts = Manifest.allTestOptions(testname)
      (testname, TestRunner.run(opts))
    }
    if(werks.foldLeft(true)(_ && _._2))
      say(Console.GREEN + "All tests successful!" + Console.RESET)
    else {
      val success = werks.map(x => if(x._2) 1 else 0).sum
      val total   = werks.size
      say(s"$success/$total tests successful")
      werks.foreach{ case(name, success) =>
        val msg = if(success) Console.GREEN + s"$name successful" + Console.RESET
        else Console.RED + s"$name failed" + Console.RESET
        say(msg)
      }
    }
  }
 }
--- a/src/test/scala/RISCV/testRunner.scala
+++ b/src/test/scala/RISCV/testRunner.scala
@ -23,6 +23,7 @@ case class TestOptions(
  printVMtrace       : Boolean,
  printVMfinal       : Boolean,
  printMergedTrace   : Boolean,
  printBinary        : Boolean,
  nopPadded          : Boolean,
  breakPoints        : List[Int], // Not implemented
  testName           : String,
@ -35,7 +36,8 @@ case class TestResult(
  program    : String,
  vmTrace    : String,
  vmFinal    : String,
-  sideBySide : String
+  sideBySide : String,
  binary     : String
 )
 object TestRunner {
@ -50,7 +52,8 @@ object TestRunner {
        binary.toList.sortBy(_._1.value).map(_._2),
        program.settings,
        finalVM.pc,
-        testOptions.maxSteps)
+        testOptions.maxSteps,
        testOptions.testName)
    } yield {
      val traces = mergeTraces(trace, chiselTrace).map(x => printMergedTraces((x), program))
@ -58,6 +61,7 @@ object TestRunner {
      val vmTraceString = printVMtrace(trace, program)
      val vmFinalState = finalVM.regs.show
      val traceString = printLogSideBySide(trace, chiselTrace, program)
      val binaryString = printBinary(binary)
      val regError = compareRegs(trace, chiselTrace)
      val memError = compareMem(trace, chiselTrace)
@ -68,7 +72,8 @@ object TestRunner {
        programString,
        vmTraceString,
        vmFinalState.toString,
-        traceString)
+        traceString,
        binaryString)
    }
    testResults.left.foreach{ error =>
@ -78,15 +83,16 @@ object TestRunner {
    testResults.map{ testResults =>
      val successful = List(testResults.regError, testResults.memError).flatten.headOption.map(_ => false).getOrElse(true)
      if(successful)
-        say(s"${testOptions.testName} succesful")
+        sayGreen(s"${testOptions.testName} succesful")
      else
-        say(s"${testOptions.testName} failed")
+        sayRed(s"${testOptions.testName} failed")
      if(testOptions.printIfSuccessful && successful){
        if(testOptions.printParsedProgram) say(testResults.program)
        if(testOptions.printVMtrace)       say(testResults.vmTrace)
        if(testOptions.printVMfinal)       say(testResults.vmFinal)
        if(testOptions.printMergedTrace)   say(testResults.sideBySide)
        if(testOptions.printBinary)        say(testResults.binary)
      }
      else{
        if(testOptions.printErrors){
@ -97,6 +103,7 @@ object TestRunner {
        if(testOptions.printVMtrace)       say(testResults.vmTrace)
        if(testOptions.printVMfinal)       say(testResults.vmFinal)
        if(testOptions.printMergedTrace)   say(testResults.sideBySide)
        if(testOptions.printBinary)        say(testResults.binary)
      }
      successful
    }.toOption.getOrElse(false)
--- a/src/test/scala/chiselTestRunner.scala
+++ b/src/test/scala/chiselTestRunner.scala
@ -159,12 +159,18 @@ object ChiselTestRunner {
    binary          : List[Int],
    settings        : List[TestSetting],
    terminalAddress : Addr,
-    maxSteps        : Int): Either[String, (Option[String], List[CircuitTrace])] = {
+    maxSteps        : Int,
    testName        : String): Either[String, (Option[String], List[CircuitTrace])] = {
    var sideEffectExtravaganza: Option[(Option[String], List[CircuitTrace])] = None
    val error: Either[String, Boolean] = scala.util.Try {
-      chisel3.iotesters.Driver(() => new Tile(), "treadle") { c =>
+      chisel3.iotesters.Driver.execute(Array(
                                         "--generate-vcd-output", "on",
                                         "--backend-name", "treadle",
                                         "--target-dir", "waveforms",
                                         "--top-name", testName
                                       ), () => new Tile) { c =>
        new PeekPokeTester(c) {
          val testRunner = new ChiselTestRunner(
            binary,
--- a/src/test/scala/fileUtils.scala
+++ b/src/test/scala/fileUtils.scala
@ -61,7 +61,10 @@ object fileUtils {
  def getAllTests: List[File] = getListOfFilesRecursive(getTestDir.getPath)
      .filter( f => f.getPath.endsWith(".s") )
-  def getAllTestNames: List[String] = getAllTests.map(_.toString.split("/").takeRight(1).mkString)
+  def getAllTestNames: List[String]        = getAllTests.map(_.toString.split("/").takeRight(1).mkString)
  // Not tested.
  def getAllWindowsTestNames: List[String] = getAllTests.map(_.toString.split("\\\\").takeRight(1).mkString)
  def clearTestResults = {
    try {
--- a/theory1.org
+++ b/theory1.org
@ -6,12 +6,15 @@
  when grading these questions, thus even with no implementation at all you
  should still be able to score 100% on the theory questions.
-  All questions can be answered in a few sentences. Remember that brevity is the
+  All questions can be answered in a few sentences. Remember that brevity is wit,
-  soul of wit, and also the key to getting a good score.
+  and also the key to getting a good score.
  You should easily be able to fit your entire answer on a single screen.
 ** Question 1
-   2 points.
+   *2 points.*
 *** Part 1
 **** Part 1½
    *½ points.*
    When decoding the BNE branch instruction in the above assembly program
    #+begin_src asm
    bne x6, x2, "loop",
@ -19,13 +22,27 @@
    In your design, what is the value of each of the control signals below?
    + memToReg
    + regWrite
    + memRead
    + memWrite
    + branch
    + jump
 **** Part 1¼
    *½ points.*
    When decoding the LW instruction in the above assembly program
    #+begin_src asm
    jal x1, 0x10(x1)
    #+end_src
    In your design, what is the value of each of the control signals below?
    + regWrite
    + memRead
    + memWrite
    + branch
    + jump
    Keep in mind that your design and your implementation are separate entities, thus
    you should answer this question based on your ideal design, not your finished 
    implementation.
@ -33,7 +50,6 @@
 *** Part 2
   During execution, at some arbitrary cycle the control signals are:
   + memToReg = 0
   + regWrite = 1
   + memRead  = 0
   + memWrite = 0
@ -48,7 +64,10 @@
   implementation.
 ** Question 2
-   4 points.
+   *4 points.*
   *NO PARTIAL CREDITS*
   Since you can test your solution with the testing framework I will not offer any
   points for a near correct solution to this problem.
   Reading the binary of a RISC-V program you get the following:
@ -77,11 +96,23 @@
   #+end_src
   *Your answer should be in the form of a simple asm program.*
-   (hint 1: the original asm program had a label, you need to infer where that label was)
+   + hint 1: 
-   (hint 2: verify your conclusion by assembling your answer)
+     the original asm program had a label, you need to infer where that label was
   + hint 2: 
     Verify your conclusion by assembling your answer.
     To do this, make an asm program, place it with the rest of the tests and set
     ~printBinary~ to ~true~ in ~singleTestOptions~ in ~Manifest.scala~ which will
     print the full binary of your program.
     As long as your program generates the same binary as the supplied your program
     is correct.
 ** Question 3
-   4 points.
+   *4 points.*
   *NO PARTIAL CREDITS*
   Since you can test your solution with the testing framework I will not offer any
   points for a near correct solution to this problem.
   In order to load a large number LUI and ADDI are used.
   consider the following program
@ -94,5 +125,37 @@
   #+end_src
   a) Which of these instructions will be split into ADDI LUI pairs?
-   b) Why do the two last instructions need to be handled differently from each other?
+   b) Explain in 3 sentences or less *how* the two last ops are handled differently and *why*.
-   (hint: The parser and assembler in the test suite can help you answer this question)
+   
   + hint 1: 
     The parser and assembler in the test suite can help you answer the first part of
     this question (a).
     Create an asm file, put it with the rest of the tests and run it, setting the correct
     test options in ~singleTestOptions~ defined in ~Manifest.scala~ and observe the output.
   + hint 2:
     While it's probably easier to solve this problem using the internet, however you 
     can also figure out what is happening by browsing the assembler source code which
     will hopefully give you a deeper insight into what is going on here.
     Look at ~Parser.scala~, specifically what happens when an ~li~ instruction is parsed.
     When parsing an instruction the parser first attempts to apply the 
     ~singleInstruction~ rule, however this only succeeds if the immediate value
     obeys certain restrictions (~nBits <= 12~), if not it fails.
     If the ~singleInstruction~ rule fails the parser then attempts to apply the
     ~multipleInstructions~ rule instead which expands operations into a list of real ops.
     When this happens the resulting operations are defined as the following:
     #+begin_src scala
     stringWs("li") ~> (reg <~ sep, (hex | int).map(_.splitHiLo(20))).mapN{ case(rd, (hi, lo)) => {
       List(
       ArithImm.add(rd, rd, lo),
       LUI(rd, if(lo > 0) hi else hi+1),
     )}}.map(_.widen[Op]),
     #+end_src
     This is quite a lot to unpack, but you can focus on the line where the ~LUI~ is constructed.
     ~hi~ and ~lo~ are the results of ~splitHiLo~ which splits a 32 bit word into a 12 bit and a
     20 bit.
     Try this for yourself on paper; what happens when ~lo~ ends up being a negative number?
     What is the interplay between incrementing ~hi~ with 1 and adding a ~lo~ that is represented
     as a negative value?
--- a/theory2.org
+++ b/theory2.org
@ -65,7 +65,6 @@
   rs1: 4               ||     rs1: 4              ||      rs1: 1
   rs2: 5               ||     rs2: 6              ||      rs2: 2
   rd:  6               ||     rd:  4              ||      rd:  5
   memToReg = false     ||     memToReg = false    ||      memToReg = false
   regWrite = true      ||     regWrite = false    ||      regWrite = true
   memWrite = false     ||     memWrite = false    ||      memWrite = false
   branch   = false     ||     branch   = true     ||      branch   = false
@ -245,7 +244,7 @@
   For this task it is necessary to use something more sophisticated than ~Map[(Int, Boolean)]~ to represent
   your branch predictor model.
-   The skeleton code is located in ~testRunner.scala~ and can be run using testOnly FiveStage.ProfileTest.
+   The skeleton code is located in ~testRunner.scala~ and can be run using testOnly FiveStage.ProfileBranching.
   With a 2 bit 8 slot scheme, how many mispredicts will happen?
   Answer with a number.