Homework 3

Due Date : 23:59:59 Friday October 15
Total Points : 100 pts

You must work on this assignment with one other person. You should work together on all parts of the assignment, but submit only one set of solutions. If you each work on part, then you each only learn part of the material. Please be sure to write both names on the submitted solutions.

Note: Copying material from Wikipedia, other online sources, or any source will not be tolerated. This form of plagiarism has occurred in the past, and penalties for violating the Duke Community Standard will be severe.

Keep all your answers short!

Dynamic ILP (60 points)

  1. (5 pts) Give a short example of assembly code that is not helped at all by dynamic scheduling (as compared to in-order execution). Explain why dynamic scheduling does not help its performance.
  2. (5 pts) Compare the Intel P6 style of renaming to the R10K style of register renaming. What are the advantages and disadvantages of each?
  3. (5 pts) Some researchers have proposed pipelining wakeup/select into more than one pipeline stage, in order to allow it to take more time (in nanoseconds) without impacting the clock period. How can pipelining wakeup/select degrade performance?
  4. (10 pts) Assume the R10K pipeline (F, D, S, X, C, R) and an L1 cache with 1 cycle hit latency and 5 cycle miss latency. Also assume a load instruction followed by an addition that depeneds on the load value.
    1. Draw two tables showing the flow of the load and add instructions through the pipeline: one with a cache hit and one with a cache miss. What do you notice about the execution for each of these?
    2. This load followed by a dependent instruction scenario occurs very often and you want to optimize its performance (make the add finish sooner). You're considering having the processor speculate. What should the processor speculate on? (Hint: think of the common case). In case of a misprediction, what must the processor do to recover (i.e. hide the impact of mispeculation)?
  5. (10 pts) Write a summary (1 page or less), of the Continual Flow Pipelines paper by Intel. What is the motivation? What were the contributions of the paper? What were its strengths and weaknesses? How do you think the paper could have been improved?
  6. (25 pts) H&P 2.12

Pipelining in SimpleScalar (40 points)

Start with the sim-outorder simulator and just use the gcc and go benchmarks. You will NOT have to modify the sim-outorder.c code for this assignment (and thus you don't need to turn in any code), but you will have to feed it different command line parameters to configure it. If you run sim-outorder without any input parameters, it will spit out all of the possibilities, which should help you to figure out how to specify the configurations in the following experiments. Please include the "necessary" output to backup your answers to the following questions.

Submit: Although you won't be submitting any code for this assignment, you are required to submit a PDF with your answers.
When you're ready to submit:

  1. Rename your file to HW3_NetId1_NetId2.pdf where NetId1 and NetId2 are the NetIds for both homework partners (e.g. HW3_ab34_xy16.pdf).
  2. Go to Duke Blackboard course page -> Tools -> Digital Dropbox -> Send File.
  3. Under name paste the filename, HW3_NetId1_NetId2.
  4. Chose the file and click Submit.