Homework 5
Due Date : 23:59:59 Thursday December 2
Total Points : 100 pts
You must work on this assignment with one other person. You should
work together on all parts of the assignment, but submit only one set
of solutions. If you each work on part, then you each only learn part
of the material. Please be sure to write both names on the submitted
solutions.
Note: Copying material from Wikipedia, other online sources, or
any source will not be tolerated. This form of plagiarism has
occurred in the past, and penalties for violating the Duke Community
Standard will be severe.
- Please submit a single PDF file with your solutions
- Please type your answers
- Explain all your answers to get full credit!
- Keep all your answers short and precise!
The Questions
- (15 pts) Assume you have a 2-wide superscalar core with dynamic instruction scheduling. Assume that the dynamic scheduler is optimal - it chooses the very best possible
schedule (instead of, say, scheduling the oldest available instructions first). Assume that an instruction must wait for 2 cycles before consuming the data provided by a load instruction. For example, if a load writes to r6 on cycle 1, then an instruction cannot consume r6 until cycle 3. Assume that all instructions take one cycle and that data-dependent instructions cannot execute on the same cycle.
(a) Show the execution of Thread 1 on this core - that is, on what cycle does each instruction execute. What is the utilization of the core, measured in terms of IPC/(superscalar width)? Do the same thing for Thread 2.
(b) Show the simultaneous execution of both Thread 1 and Thread 2 on an SMT core. Assume that the core can fetch instructions from both threads during the same cycle. The core is identical to the core for part (a), except that it has 2 thread contexts. What is the utilization of the core now?
Thread 1
add r1, r2, r3 // r1=r2+r3
load r4, A // r4=Mem[A]
sub r5, r1, r4
mul r6, r5, r2
Thread 2
load r11, B
load r12, C
xor r13, r11, r12
and r16, r13, r12
- (10 pts) H&P 4.3
- (30 pts) H&P 4.16
- (15 pts) Discuss the advantages and disadvantages of using a bus
vs. 2 dimensional mesh to interconnect processors in a cache
coherenct shared memory multiprocessor.
- (15 pts) Write a brief summary (2 paragrqphs) of the RIFLE paper;
please ensure you identify and explain the main ideas
and provide your opinion on the strengths and weaknesses of the idea
given everything you've learned this semester.
- (15 pts) Write a brief summary (2 paragraphs) of how DIVA works,
what are the advantages of using a small checker core?
Submission instructions
- Rename your file to HW5_NetId1_NetId2.pdf where NetId1 and NetId2 are the NetIds for both homework partners (e.g. HW5_ab34_xy16.pdf).
- Go to Duke Blackboard course page -> Tools -> Digital Dropbox -> Send File.
- Under name paste the filename, HW5_NetId1_NetId2.
- Chose the file and click Submit.