Homework 3
Due Date : 23:59:59 Friday October 15
Total Points : 100 pts
You must work on this assignment with one other person. You should
work together on all parts of the assignment, but submit only one set
of solutions. If you each work on part, then you each only learn part
of the material. Please be sure to write both names on the submitted
solutions.
Note: Copying material from Wikipedia, other online sources, or
any source will not be tolerated. This form of plagiarism has
occurred in the past, and penalties for violating the Duke Community
Standard will be severe.
Keep all your answers short!
Dynamic ILP (60 points)
- (5 pts) Give a short example of assembly code that is not helped at all by dynamic
scheduling (as compared to in-order execution). Explain why dynamic scheduling does not help
its performance.
- (5 pts) Compare the Intel P6 style of renaming to the R10K style of register renaming.
What are the advantages and disadvantages of each?
- (5 pts) Some researchers have proposed pipelining wakeup/select into more than one pipeline stage,
in order to allow it to take more time (in nanoseconds) without impacting the clock period.
How can pipelining wakeup/select degrade performance?
- (10 pts) Assume the R10K pipeline (F, D, S, X, C, R) and an L1 cache with 1 cycle hit latency and 5 cycle miss latency.
Also assume a load instruction followed by an addition that depeneds on the load value.
- Draw two tables showing the flow of the load and add instructions through the pipeline: one with a cache hit and one with a cache miss.
What do you notice about the execution for each of these?
- This load followed by a dependent instruction scenario occurs very often and you want to optimize its performance (make the add finish sooner).
You're considering having the processor speculate. What should the processor speculate on? (Hint: think of the common case).
In case of a misprediction, what must the processor do to recover (i.e. hide the impact of mispeculation)?
- (10 pts) Write a summary (1 page or less), of the Continual Flow Pipelines paper by Intel.
What is the motivation?
What were the contributions of the paper?
What were its strengths and weaknesses? How do you think the paper could have been improved?
- (25 pts) H&P 2.12
Pipelining in SimpleScalar (40 points)
Start with the sim-outorder simulator and just use the gcc and go benchmarks.
You will NOT have to modify the sim-outorder.c code for this assignment (and thus you don't need
to turn in any code), but you will have to feed it different command line parameters to configure it.
If you run sim-outorder without any input parameters, it will spit out all of the possibilities,
which should help you to figure out how to specify the configurations in the following experiments.
Please include the "necessary" output to backup your answers to the following questions.
- Experiment #1:
Compare in-order versus out-of-order execution (hint: the default is out-of-order,
and there's a flag that can change this). Don't change any other flags. What do you observe?
For the rest of the experiments, assume an out-of-order core.
- Experiment #2:
Evaluate the importance of the Register Update Unit (RUU) size, by comparing a size of 32 vs. a size of
64. As with all experiments here, don't change anything else.
Explain your result - that is,
why did the change in RUU size have a small/big impact?
- Experiment #3
Evaluate the importance of superscalar width by comparing a 4-wide to an 8-wide.
Remember that you want to balance (set to equal values) the widths of decode, issue, and commit.
Is the performance benefit of going from 4-wide to 8-wide worth the hardware and power costs?
Explain why or why not.
- Experiment #4:
For a 4-wide pipeline, evaluate the impact of the number of integer ALUs by comparing a processor
with 2 to a processor with 4. What does this result tell you about the number of ALUs necessary to
avoid structural hazards?
- Analysis:
Given what you learned from these 4 experiments, explain
- which issues are most important for performance
- where you might choose a lesser performing design point for reasons of power-efficiency and cost-effectiveness.
Justify your answers!
Submit: Although you won't be submitting any code for this assignment, you are required to submit a PDF with your answers.
When you're ready to submit:
- Rename your file to HW3_NetId1_NetId2.pdf where NetId1 and NetId2 are the NetIds for both homework partners (e.g. HW3_ab34_xy16.pdf).
- Go to Duke Blackboard course page -> Tools -> Digital Dropbox -> Send File.
- Under name paste the filename, HW3_NetId1_NetId2.
- Chose the file and click Submit.