CS 208 Assignment

This exercise is a continuation of the stuff we did with caches during class on Monday, May 27. Nothing to hand in, but you'll likely find it worthwhile to have done this exercise once you get the takehome final.

Goals

Get some practice studying the behavior of different types of caches
Think about the effects of block size and associativity on hit rate and total run time

Code

We'll be pretending to execute the following subroutine, assuming $a0 = 9. That is, we're running the loop below through 8 iterations.

# The fibonacci subroutine. # Precondition: $a0 contains N # Post-condition: $v0 contains the Nth fibonacci number, where # f_0 = f_1 = 1, f_2 = 2, etc. fib: addiu $v0, $zero, 1 # current fibonacci number addiu $t0, $zero, 1 # previous fibonacci number addiu $t1, $zero, 1 # loop counter fibloop: sltu $t2, $t1, $a0 bnez $t2, fibbody jr $ra fibbody: add $t2, $v0, $t0 move $t0, $v0 move $v0, $t2 addiu $t1, $t1, 1 b fibloop

Assumptions

We have one cache + main memory
The program and its data are already loaded into main memory
The address of fib is 0x00400040
The cache starts out empty (i.e. all valid bits are cleared (i.e. set to 0))
Instruction execution time consists of:
- [FETCH] Fetch the instruction into an Instruction Register
- [DATA] If it's a lw, lb, sw, or sb, load or store data
- [EXECUTE] The rest of the datapath work
Times
- FETCH
  - instruction is in the cache already: 20ns
  - instruction is not in the cache yet: 250ns
- DATA
  - data is in the cache already: 20ns
  - data is not in the cache yet: 250ns
- EXECUTE
  - 40ns

Scenario #1

Here's our cache:

direct-mapped
4 rows
4-byte blocks

Using the assumptions listed above, simulate the execution of the fib subroutine, and answer these questions:

How long did it take, in nanoseconds?
What was the cache hit rate? (That is, count the total number of times the desired instruction was in the cache already, and divide it by the total number of instructions executed.)
What are the contents of the cache at the end (i.e. at the time the "jr $ra" instruction is executed)?

Scenario #2

Same questions as Scenario #1, but with this cache:

direct-mapped
16 rows
4-byte blocks

Scenario #3

Same questions as Scenario #1, but with this cache:

direct-mapped
4 rows
16-byte blocks

Scenario #4

Same questions as Scenario #1, but with this cache:

two-way set associative
4 rows
8-byte blocks

Extra questions

How many bytes of data from main memory can each of the caches above contain?
Do you have any intuition about what's more important: total storage space in the cache, degree of associativity, block size, number of rows?
Why might it be desirable to load more than one instruction at a time? Are there any downsides?
Is there anything unrealistic about the assumptions we're making?

CS 208: Computer Organization & Architecture

Practice exercises on caches

Goals

Code

Assumptions

Scenario #1

Scenario #2

Scenario #3

Scenario #4

Extra questions