This exercise is a continuation of the stuff we did with caches
during class on Monday, May 27. Nothing to hand in, but you'll
likely find it worthwhile to have done this exercise once you
get the takehome final.
Goals
- Get some practice studying the behavior of different
types of caches
- Think about the effects of block size and associativity
on hit rate and total run time
Code
We'll be pretending to execute the following subroutine, assuming
$a0 = 9. That is, we're running the loop below through 8 iterations.
# The fibonacci subroutine.
# Precondition: $a0 contains N
# Post-condition: $v0 contains the Nth fibonacci number, where
# f_0 = f_1 = 1, f_2 = 2, etc.
fib:
addiu $v0, $zero, 1 # current fibonacci number
addiu $t0, $zero, 1 # previous fibonacci number
addiu $t1, $zero, 1 # loop counter
fibloop:
sltu $t2, $t1, $a0
bnez $t2, fibbody
jr $ra
fibbody:
add $t2, $v0, $t0
move $t0, $v0
move $v0, $t2
addiu $t1, $t1, 1
b fibloop
Assumptions
- We have one cache + main memory
- The program and its data are already
loaded into main memory
- The address of fib is 0x00400040
- The cache starts out empty (i.e. all valid
bits are cleared (i.e. set to 0))
- Instruction execution time consists of:
- [FETCH] Fetch the instruction into an Instruction Register
- [DATA] If it's a lw, lb, sw, or sb, load or store data
- [EXECUTE] The rest of the datapath work
- Times
- FETCH
- instruction is in the cache already: 20ns
- instruction is not in the cache yet: 250ns
- DATA
- data is in the cache already: 20ns
- data is not in the cache yet: 250ns
- EXECUTE
Scenario #1
Here's our cache:
- direct-mapped
- 4 rows
- 4-byte blocks
Using the assumptions listed above, simulate the execution of the fib
subroutine, and answer these questions:
- How long did it take, in nanoseconds?
- What was the cache hit rate? (That is, count the total number
of times the desired instruction was in the cache already, and
divide it by the total number of instructions executed.)
- What are the contents of the cache at the end (i.e. at the time the
"jr $ra" instruction is executed)?
Scenario #2
Same questions as Scenario #1, but with this cache:
- direct-mapped
- 16 rows
- 4-byte blocks
Scenario #3
Same questions as Scenario #1, but with this cache:
- direct-mapped
- 4 rows
- 16-byte blocks
Scenario #4
Same questions as Scenario #1, but with this cache:
- two-way set associative
- 4 rows
- 8-byte blocks
Extra questions
- How many bytes of data from main memory can each of the
caches above contain?
- Do you have any intuition about what's more important: total
storage space in the cache, degree of associativity, block size,
number of rows?
- Why might it be desirable to load more than one instruction at a time?
Are there any downsides?
- Is there anything unrealistic about the assumptions we're making?