[Prev][Next][Index]
CS Prelim Announcement
- From: sri@news.cs.duke.edu (Srikanth T. Srinivasan)
- Newsgroups: duke.cs.os-research
- Subject: CS Prelim Announcement
- Date: 17 Mar 1999 20:18:49 GMT
- Organization: Duke University Department of Computer Science, Durham, NC
- Xref: news.duke.edu duke.cs.os-research:261
DUKE UNIVERSITY
COMPUTER SCIENCE DEPARTMENT
Ph.D. PRELIMINARY EXAMINATION
Load Latency Tolerance in Dynamically Scheduled Processors
Srikanth T. Srinivasan
Advisor: Dr. Alvin R. Lebeck
Levine Science Research Center, D344
Friday, April 2, 1999
1:00 p.m.
The performance gap between processors and memory systems is growing, yet
current caches tend to be naive. Understanding the interaction of the memory
system with the user program, the processor and the compiler, can lead to
better solutions to bridge this gap. In my work, I explore one such
characteristic, load latency tolerance.
Limitations of dynamically scheduled processors due to data dependencies,
finite resources and branch mispredictions cause a variation in the latency
tolerance of loads. My first contribution is to present a method to quantify
this variation. My results show that depending on benchmark and processor
configuration, between 21% and 68% of the loads need to complete in one
cycle, whereas between 2% and 28% of loads can wait for at least 8 cycles
and still achieve IPCs (committed Instructions Per cycle) within 8% of an
ideal memory system (where all loads complete in one cycle). Traditional
memory systems fail to do a good job in capturing this variation. The
potential increase in performance that can be obtained by incorporating
latency tolerance consciousness into traditional memory systems ranges
between 14% and 71% across benchmarks.
The second contribution of my work is to increase processor performance by
exploiting load latency tolerance information. I investigate the use of two
schemes, priority scheduling and selective caching, for this purpose. Both
schemes differentiate between loads and give priority to critical loads when
they compete for resources with non-critical loads as well as other
instructions. The goal of priority scheduling is to enable critical loads to
complete early by minimizing the delays incurred by them in the various
queues. Scheduling critical loads earlier in the ready queue, MSHRs, and
the data port queue produces up to a 13% increase in IPC. Selective caching
involves changing the replacement policy in the L1 cache to make critical
loads stay closest to the processor and thereby make caches conform to the
processors' load latency requirements. This is accomplished by denying
non-critical loads entry into the L1 cache when it attempts to displace
critical data out of the L1 cache. Such non-critical data are allowed to
bypass the L1 cache with benefits of a shorter latency as well as reduced
occupancy on the L2 bus.
I consider three approaches for gathering load latency tolerance information,
profiling, static compiler analysis, and a dynamic hardware mechanism. My
profiling scheme requires the use of a complex simulator with capabilities
for rolling back the state of the processor and is used only as a
proof-of-concept approach. Compile time analysis suffers from a lack of
branch prediction information. I plan to explore several alternatives to
overcome this limitation. My hardware scheme models the latency tolerance
of a load as a function of dependencies and processor utilization and uses
tables, registers, and associated logic for dynamically keeping track of the
parameters. A cost-benefit analysis of these 3 approaches needs to be done.