Research
My research interests lie in the areas of Computer Architecture, Chip multi-processing (CMP) design.
Selected Projects:
Master Project: Title: Analysis of Directory-based Coherence Protocol Implementation
Project Presentation: March 2, 2010
Even though the directory-based cache coherence protocol scales to many cores, the access latency limits its performance and storage space overhead increases as the system grows bigger. Moreover verifying the correctness of a protocol on a bigger system is time-consuming. We first design optimized directories which reduce the space overhead of a directory structure. Our analytical results show that as memory grows, the advantage of using an optimized directory over a conventional or a sparse directory becomes bigger. Secondly, we show a preliminary implementation of a fractal directory-based coherence protocol developed by others. This protocol has the unique advantage of reducing verification time.
Research Initiation Project (RIP): Directory Optimizations for CMP
Proposal Presentation: September 26, 2008
Final Presentation: May 27, 2009
Chip Multiprocessors (CMPs) exploit core level parallelism to overcome the clock frequency limitation and achieve high throughput. Parallelism can also be exploited at finer levels, such as system components. Exploiting parallelism hides long latency events. Power and energy savings can be achieved while maintaining high throughput without increasing the clock frequency. In this project, I designed and evaluated a power/energy efficient, throughput-oriented directory controller architecture. Simulations and analytical evaluations showed that my approach reduces power/energy consumption without losing performance.
L2 Cache Management at OS-Level
Spring 2008
This project proposed and implemented an innovative L2 cache management mechanism at the OS-Level for CMP. The design increased the effective capacity of private L2 caches while decreasing the wire delay of shared L2 caches. Since this solution is through OS page allocation, it simplifies hardware complexity. We simulated the system in SIMICS -- a full system simulator with GEMS -- a memory system timing model. Results showed that our solution outperforms traditional private L2 caches and shared L2 caches for CMP.
EarlyBIRD: Early Branch Instruction Resolution Device
Fall 2007
EarlyBIRD resolves branch instructions as early as possible so that the amount of speculative work is reduced, power is saved and IPC is increased. Experimental analysis showed that this approach is better than FIFO and Greedy approaches.
Publication:
[1]. Jie Xiao, Dan Feng, Zhan Shi, Mengfei Cheng. "Flexible Metadata Management for Block-Clevel Storage System", Proceedings of the Seventh ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2006), Las Vegas, Nevada, USA, June 19-20, 2006.