Homework 4

Due Date : 23:59:59 Thursday November 11
Total Points : 100 pts


You must work on this assignment with one other person. You should work together on all parts of the assignment, but submit only one set of solutions. If you each work on part, then you each only learn part of the material. Please be sure to write both names on the submitted solutions.

Note: Copying material from Wikipedia, other online sources, or any source will not be tolerated. This form of plagiarism has occurred in the past, and penalties for violating the Duke Community Standard will be severe.

Cache Memory (50 points)

  1. (5 pts) Assume you have a cache with exactly 8 frames and that there is one word per frame. Compare direct-mapped vs. fully associative for the following stream of word accesses: 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, ... (repeating sequence of 9 addresses). What lesson do you learn from this experiment? Would a victim cache help?
  2. (15 pts) H&P 5.1 a, b, c (not d) Note that you should use the online version of Cacti.
  3. (20 pts) H&P 5.2 (all of it) Note that you should use the online version of Cacti.
  4. (10 pts) H&P 5.6 (a and b only)

Cache configuratinos in Simple Scalar (50 points)

Experiments:

  1. Use the sim-cache executable for 3 benchmarks (anagram, gcc and go) to evaluate the performance of the following L1 D$ cache configurations :
    Evaluate each of these for a data cache size of 1KB (not including tags).
    Since sim-cache does not give timing, use instruction counts, the miss rate and the following cycle counts for calculating timing (you need to do this yourself) : Note: Remember that as you double the associativity, the number of sets halve, you can keep everything else as the default value in the simulator

  2. Now use sim-outorder to evaluate the relationship between out-of-order execution and L1 data cache organization. Using a 1KB direct-mapped cache with hit latency of 1 cycle, 2 cycles and 4 cycles, simulate the following configurations using the 2 benchmarks gcc and go (total of 18 configurations): Note: For the inorder part you need to use sim-outorder with inorder flag enabled.

    Analysis: Explain the relative impact of data cache access latency with respect to issue width, in-order vs. out-of-order, and with respect to RUU size (run more experiments if you need to). Also comment on the relative power consumption of each design. Be sure to use the correct cycle count, not "simulation time" for comparing performance.

Submission instructions

  1. Rename your file to HW4_NetId1_NetId2.pdf where NetId1 and NetId2 are the NetIds for both homework partners (e.g. HW4_ab34_xy16.pdf).
  2. Go to Duke Blackboard course page -> Tools -> Digital Dropbox -> Send File.
  3. Under name paste the filename, HW4_NetId1_NetId2.
  4. Chose the file and click Submit.