Homework 4
Due Date : 23:59:59 Thursday November 11
Total Points : 100 pts
You must work on this assignment with one other person. You should
work together on all parts of the assignment, but submit only one set
of solutions. If you each work on part, then you each only learn part
of the material. Please be sure to write both names on the submitted
solutions.
Note: Copying material from Wikipedia, other online sources, or
any source will not be tolerated. This form of plagiarism has
occurred in the past, and penalties for violating the Duke Community
Standard will be severe.
- Please submit a single PDF file with your solutions
- Please type your answers
- Explain all your answers to get full credit!
- Keep all your answers short and precise!
- Make sure to include simulation output data you use for answers
Cache Memory (50 points)
- (5 pts) Assume you have a cache with exactly 8 frames and that
there is one word per frame. Compare direct-mapped vs. fully
associative for the following stream of word accesses: 1, 2, 3, 4,
5, 6, 7, 8, 9, 1, 2, 3, 4, ...
(repeating sequence of 9 addresses). What lesson do you learn from
this experiment? Would a victim cache help?
- (15 pts) H&P 5.1 a, b, c (not d) Note that you should use the online
version of Cacti.
- (20 pts) H&P 5.2 (all of it) Note that you should use the online version
of Cacti.
- (10 pts) H&P 5.6 (a and b only)
Cache configuratinos in Simple Scalar (50 points)
Experiments:
- Use the sim-cache executable for 3 benchmarks (anagram, gcc and go) to evaluate
the performance of the following L1 D$ cache configurations :
- 1-cycle direct-mapped cache
- 2-cycle 2-way set associative cache
- 3-cycle 4-way set associative cache
Evaluate each of these for a data cache size of 1KB (not including tags).
Since sim-cache does not give timing, use instruction counts, the miss rate
and the following cycle counts for calculating timing (you need to do this
yourself) :
- Clock cycle time: 0.5ns
- Miss penalty: 300 cycles
- Cycles for instructions other than load/stores: 1 cycle
Note: Remember that as you double the associativity, the number of sets
halve, you can keep everything else as the default value in the simulator
- Now use sim-outorder to evaluate the relationship between
out-of-order execution and L1 data cache organization. Using a 1KB direct-mapped
cache with hit latency of 1 cycle, 2 cycles and 4 cycles, simulate the following
configurations using the 2 benchmarks gcc and go (total of 18 configurations):
- 2-wide inorder
- 4-wide inorder
- 2-wide-issue with 4 entry RUU
- 2-wide-issue with 32 entry RUU
- 4-wide-issue with 4 entry RUU
- 4-wide-issue with 32 entry RUU
Note: For the inorder part you need to use sim-outorder with inorder flag enabled.
Analysis: Explain the relative impact of data cache access latency with respect
to issue width, in-order vs. out-of-order, and with respect to RUU size (run
more experiments if you need to). Also comment on the relative power
consumption of each design. Be sure to use the correct cycle
count, not "simulation time" for comparing performance.
Submission instructions
- Rename your file to HW4_NetId1_NetId2.pdf where NetId1 and NetId2 are the NetIds for both homework partners (e.g. HW4_ab34_xy16.pdf).
- Go to Duke Blackboard course page -> Tools -> Digital Dropbox -> Send File.
- Under name paste the filename, HW4_NetId1_NetId2.
- Chose the file and click Submit.