234 lines
		
	
	
	
		
			9.7 KiB
		
	
	
	
		
			Org Mode
		
	
	
	
	
	
			
		
		
	
	
			234 lines
		
	
	
	
		
			9.7 KiB
		
	
	
	
		
			Org Mode
		
	
	
	
	
	
| * Question 1 - Hazards
 | |
|   For the following programs describe each hazard with type (data or control), line number and a
 | |
|   small (max one sentence) description
 | |
| 
 | |
| ** program 1
 | |
|   #+begin_src asm
 | |
|     addi t0,   zero,  10
 | |
|     addi t1,   zero,  20
 | |
|   L2:
 | |
|     sub  t1,   t1,    t0
 | |
|     beq  t1,   zero, .L2
 | |
|     jr   ra
 | |
|   #+end_src
 | |
| 
 | |
| 
 | |
| ** program 2
 | |
|   #+begin_src asm
 | |
|     addi t0,   zero,  10
 | |
|     lw   t0,   10(t0)
 | |
|     beq  t0,   zero,  .L3
 | |
|     jr   ra
 | |
|   #+end_src
 | |
| 
 | |
| 
 | |
| ** program 3
 | |
|   #+begin_src asm
 | |
|   lw   t0,   0(t0)
 | |
|   lw   t1,   4(t0)
 | |
|   sw   t0,   8(t1)
 | |
|   lw   t1,   12(t0)
 | |
|   beq  t0,   t1,  .L3
 | |
|   jr   ra
 | |
|   #+end_src
 | |
| 
 | |
| 
 | |
| * Question 2 - Handling hazards
 | |
|   For this question, keep in mind that the forwarder does not care if the values it forwards are being used or not!
 | |
|   Even for a JAL instructions which has neither an rs1 or rs2 field, the forwarder must still forward its values.
 | |
| 
 | |
| ** Data hazards 1
 | |
|    At some cycle the following instructions can be found in a 5 stage design:
 | |
|    
 | |
|    #+begin_src text
 | |
|    EX:                  ||     MEM:                ||      WB:
 | |
|    ---------------------||-------------------------||--------------------------
 | |
|    rs1: 4               ||     rs1: 4              ||      rs1: 1
 | |
|    rs2: 5               ||     rs2: 6              ||      rs2: 2
 | |
|    rd:  6               ||     rd:  4              ||      rd:  5
 | |
|    memToReg = false     ||     memToReg = false    ||      memToReg = false
 | |
|    regWrite = true      ||     regWrite = false    ||      regWrite = true
 | |
|    memWrite = false     ||     memWrite = false    ||      memWrite = false
 | |
|    branch   = false     ||     branch   = true     ||      branch   = false
 | |
|    jump     = false     ||     jump     = false    ||      jump     = false
 | |
|    #+end_src
 | |
|    
 | |
|    For the operation currently in EX, from where (ID, MEM or WB) should the forwarder get data from for rs1 and rs2?
 | |
|    
 | |
| ** Data hazards 2
 | |
| 
 | |
|    At some cycle the following instructions can be found in a 5 stage design:
 | |
|    
 | |
|    #+begin_src text
 | |
|    EX:                  ||     MEM:                ||      WB:
 | |
|    ---------------------||-------------------------||--------------------------
 | |
|    rs1: 1               ||     rs1: 4              ||      rs1: 1
 | |
|    rs2: 5               ||     rs2: 6              ||      rs2: 0
 | |
|    rd:  0               ||     rd:  1              ||      rd:  0
 | |
|    memToReg = false     ||     memToReg = false    ||      memToReg = false
 | |
|    regWrite = true      ||     regWrite = true     ||      regWrite = true
 | |
|    memWrite = false     ||     memWrite = false    ||      memWrite = false
 | |
|    branch   = false     ||     branch   = true     ||      branch   = false
 | |
|    jump     = true      ||     jump     = true     ||      jump     = false
 | |
|    #+end_src
 | |
| 
 | |
|    For the operation currently in EX, from where (ID, MEM or WB) should the forwarder get data from for rs1 and rs2?
 | |
| 
 | |
| ** Data hazards 3
 | |
| 
 | |
|    At some cycle the following instructions can be found in a 5 stage design:
 | |
|    
 | |
|    #+begin_src text
 | |
|    EX:                  ||     MEM:                ||      WB:
 | |
|    ---------------------||-------------------------||--------------------------
 | |
|    rs1: 2               ||     rs1: 4              ||      rs1: 3
 | |
|    rs2: 5               ||     rs2: 6              ||      rs2: 4
 | |
|    rd:  1               ||     rd:  1              ||      rd:  5
 | |
|    memToReg = false     ||     memToReg = true     ||      memToReg = false
 | |
|    regWrite = false     ||     regWrite = true     ||      regWrite = true
 | |
|    memWrite = true      ||     memWrite = false    ||      memWrite = false
 | |
|    branch   = false     ||     branch   = false    ||      branch   = false
 | |
|    jump     = false     ||     jump     = false    ||      jump     = false
 | |
| 
 | |
|    Should the forwarding unit issue a load hazard signal?
 | |
|    (Hint: what are the semantics of the instruction currently in EX stage?)
 | |
|    #+end_src
 | |
| 
 | |
| * Question 3 - Branch prediction
 | |
|   Consider a 2 bit branch predictor with only 4 slots for a 32 bit architecture, where the decision to 
 | |
|   take a branch or not is decided in accordance to the following table:
 | |
|   #+begin_src text
 | |
|   state  ||  predict taken  ||  next state if taken  ||  next state if not taken ||
 | |
|   =======||=================||=======================||==========================||
 | |
|   00     ||  NO             ||  01                   ||  00                      ||
 | |
|   01     ||  NO             ||  11                   ||  00                      ||
 | |
|   10     ||  YES            ||  11                   ||  00                      ||
 | |
|   11     ||  YES            ||  11                   ||  10                      ||
 | |
|   #+end_src
 | |
| 
 | |
|   At some point during execution the program counter is ~0xc~ and the branch predictor table looks like this:
 | |
|   #+begin_src text
 | |
|   slot  ||  value
 | |
|   ======||========
 | |
|   00    ||  01
 | |
|   01    ||  00
 | |
|   10    ||  11
 | |
|   11    ||  01
 | |
|   #+end_src
 | |
| 
 | |
|   For the following program:
 | |
|   #+begin_src asm
 | |
|   0xc  addi x1, x3, 10
 | |
|   0x10 add  x2, x1, x1
 | |
|   0x14 beq  x1, x2, .L1 
 | |
|   0x18 j    .L2
 | |
|   #+end_src
 | |
|   
 | |
|   Will the predictor predict taken or not taken for the beq instruction?
 | |
| 
 | |
| * Question 4 - Benchmarking
 | |
|   In order to gauge the performance increase from adding branch predictors it is necessary to do some testing.
 | |
|   Rather than writing a test from scratch it is better to use the tester already in use in the test harness.
 | |
|   When running a program the VM outputs a log of all events, including which branches have been taken and which
 | |
|   haven't, which as it turns out is the only information we actually need to gauge the effectiveness of a branch
 | |
|   predictor!
 | |
| 
 | |
|   For this exercise you will write a program that parses a log of branch events.
 | |
| 
 | |
|   #+BEGIN_SRC scala
 | |
|   sealed trait BranchEvent
 | |
|   case class Taken(addr: Int) extends BranchEvent
 | |
|   case class NotTaken(addr: Int) extends BranchEvent
 | |
| 
 | |
| 
 | |
|   def profile(events: List[BranchEvent]): Int = ???
 | |
|   #+END_SRC
 | |
| 
 | |
|   To help you get started, I have provided you with much of the necessary code.
 | |
|   In order to get an idea for how you should profile branch misses, consider the following profiler which calculates
 | |
|   misses for a processor with a branch predictor with a 1 bit predictor with infinite memory:
 | |
| 
 | |
|   #+BEGIN_SRC scala
 | |
|   def OneBitInfiniteSlots(events: List[BranchEvent]): Int = {
 | |
| 
 | |
|     // Helper inspects the next element of the event list. If the event is a mispredict the prediction table is updated
 | |
|     // to reflect this.
 | |
|     // As long as there are remaining events the helper calls itself recursively on the remainder
 | |
|     def helper(events: List[BranchEvent], predictionTable: Map[Int, Boolean]): Int = {
 | |
|       events match {
 | |
| 
 | |
| 	// Scala syntax for matching a list with a head element of some type and a tail
 | |
| 	// `case h :: t =>`
 | |
| 	// means we want to match a list with at least a head and a tail (tail can be Nil, so we
 | |
| 	// essentially want to match a list with at least one element)
 | |
| 	// h is the first element of the list, t is the remainder (which can be Nil, aka empty)
 | |
| 
 | |
| 	// `case Constructor(arg1, arg2) :: t => `
 | |
| 	// means we want to match a list whose first element is of type Constructor, giving us access to its internal
 | |
| 	// values.
 | |
| 
 | |
| 	// `case Constructor(arg1, arg2) :: t => if(p(arg1, arg2))`
 | |
| 	// means we want to match a list whose first element is of type Constructor while satisfying some predicate p,
 | |
| 	// called an if guard.
 | |
| 	case Taken(addr)    :: t if( predictionTable(addr)) => helper(t, predictionTable)
 | |
| 	case Taken(addr)    :: t if(!predictionTable(addr)) => 1 + helper(t, predictionTable.updated(addr, true))
 | |
| 	case NotTaken(addr) :: t if(!predictionTable(addr)) => 1 + helper(t, predictionTable.updated(addr, false))
 | |
| 	case NotTaken(addr) :: t if( predictionTable(addr)) => helper(t, predictionTable)
 | |
| 	case _ => 0
 | |
|       }
 | |
|     }
 | |
| 
 | |
|     // Initially every possible branch is set to false since the initial state of the predictor is to assume branch not taken
 | |
|     def initState = events.map{
 | |
|       case Taken(addr)    => (addr, false)
 | |
|       case NotTaken(addr) => (addr, false)
 | |
|     }.toMap
 | |
| 
 | |
|     helper(events, initState)
 | |
|   }
 | |
|   #+END_SRC
 | |
| 
 | |
| ** Your task
 | |
|    Your job is to implement a test that checks how many misses occur for a 2 bit branch predictor with 8 slots.
 | |
|    The rule table is the same as in question 3.
 | |
|    For simplicitys sake, assume that every value in the table is initialized to 00.
 | |
| 
 | |
|    For this task it is necessary to use something more sophisticated than ~Map[(Int, Boolean)]~ to represent
 | |
|    your branch predictor model.
 | |
| 
 | |
|    The skeleton code is located in ~testRunner.scala~ and can be run using testOnly FiveStage.ProfileTest.
 | |
| 
 | |
|    With a 2 bit 4 slot scheme, how many misses will you incur?
 | |
|    Answer with a number.
 | |
| 
 | |
| * Question 5 - Cache profiling
 | |
|   Unlike our design which has a very limited memory pool, real designs have access to vast amounts of memory, offset
 | |
|   by a steep cost in access latency.
 | |
|   To amend this a modern processor features several caches where even the smallest fastest cache has more memory than
 | |
|   your entire design.
 | |
|   In order to investigate how caches can alter performance it is therefore necessary to make some rather
 | |
|   unrealistic assumptions to see how different cache schemes impacts performance.
 | |
| 
 | |
|   We will therefore assume the following:
 | |
|   + Reads from main memory takes 5 cycles
 | |
|   + cache has a total storage of 32 words (1024 bits)
 | |
|   + cache reads work as they do now (i.e no additional latency)
 | |
| 
 | |
|   For this exercise you will write a program that parses a log of memory events, similar to previous task
 | |
|   #+BEGIN_SRC scala
 | |
|   sealed trait MemoryEvent
 | |
|   case class Write(addr: Int) extends MemoryEvent
 | |
|   case class Read(addr: Int) extends MemoryEvent
 | |
| 
 | |
| 
 | |
|   def profile(events: List[MemoryEvent]): Int = ???
 | |
|   #+END_SRC
 | |
| 
 | |
| ** Your task
 | |
|    Your job is to implement a model that tests how many delay cycles will occur for a cache which:
 | |
|    + Follows a 2-way associative scheme
 | |
|    + Block size is 4 words (128 bits)
 | |
|    + Is write-through write no-allocate
 | |
|    + Eviction policy is LRU (least recently used)
 | |
| 
 | |
|    Your answer should be the number of cache miss latency cycles when using this cache.
 | 
