I participated in several earlier research projects in operating systems, cluster computing, and distributed computing. They reflect common themes of resource virtualization, uniform storage and sharing, dynamic resource management, and reconfigurable applications and services.
Distributed cluster computing. Dynamically reconfigurable cluster computing has been a theme of my research since the Amber project, which was one of the earliest systems for parallel cluster computing [14,25]. Amber enabled applications to configure the mapping of data and computation to cluster nodes to balance competing performance objectives of load balancing and locality. Experiments with the prototype demonstrated the potential of reconfiguring this mapping automatically to respond to changes in the set of physical nodes [25]--foreshadowing my current work in adaptive services hosted on dynamic virtual clusters.
Operating systems for wide-address architectures. My dissertation research on the Opal system was the first of several research projects to experiment with a single address space operating system approach on emerging 64-bit processors. The appeal of uniform sharing in a protected global address space has produced a long history of systems that approached this objective through varying addressing and protection models. Opal established the basic structure and issues for a protected uniform virtual memory model incorporating shared data and storage, and demonstrated how this structure could enable a continuum of protection and sharing relationships among modular software components in order to balance performance and failure isolation. Elements of the Opal research focused on programming environments for data persistence and protected sharing [11], recovery and consistency for distributed data [26], architectural issues for address translation and protection [41], and operating system abstractions and resource management [19].
Cluster memory and storage. My research in network storage systems began with the Global Memory Service (GMS) project at the University of Washington. GMS was a cooperative network memory page caching service that adapted to changes in cluster load and memory demand. GMS was a response to an order-of-magnitude jump in local network bandwidth, which made it faster to fetch a page from the memory of another node (network memory) than to fetch it from a local disk. We integrated GMS with the Trapeze network I/O system described below, and investigated global prefetching [60,4] in the context of GMS. A novel prefetch-safe trace reduction algorithm called FASTSLIM made it possible to evaluate virtual memory caching and prefetching schemes efficiently using trace-driven simulation [38].
After considering various approaches to cluster storage [15,8], we developed an ensemble storage service architecture called Slice [6,7,5]. Slice explored use of content-based request routing to distribute file service traffic across a dynamic ensemble of servers and network-attached block storage devices, which act together as a unified ``virtual file storage appliance''. Slice benefits from a dynamic mapping of storage objects to servers to balance the load across the ensemble, without imposing new burdens on users or administrators. Slice was the dissertation research of doctoral student Darrell Anderson.
End-system networking and network storage. The Trapeze project--an outgrowth of our work with network memory--investigated network interface (NIC) device techniques and OS structures to harness the performance potential of emerging high-speed networks. We developed a new firmware program for programmable Myrinet NICs, and OS kernel software. Trapeze established new performance standards for page transfer latency and delivered network bandwidth in clusters. It served as a basis for experimentation with novel NIC techniques and related OS structures [65,62,4,8,15,64]. We also used Trapeze to study the effects of NIC and OS features for TCP/IP performance on several platforms [32,17,13]. The Trapeze research improved understanding of the role of advanced network elements in high-performance services, and was useful to industry in designing systems for high-speed Internet networking, particularly in the network storage arena. This research also led to a brief mention in the New York Times, and a licensed patent for self-tuning NIC features to balance transfer latency and bandwidth.
At that time there was a resurgence of industry interest in direct-access NICs supporting ``user-level'' networking and features for Remote Direct Memory Access (RDMA), primarily for high-performance network storage in data centers. In a partnership with Harvard and Network Appliance, we developed a reference implementation of a proposed Direct Access File System (DAFS) standard, and investigated operating system structures and application-controlled I/O caching in this context [42]. I also played a role early in an Internet (IETF) standards process to promote understanding of RDMA and its interaction with Internet transport protocols and other approaches to low-overhead networking. More recently, we established analytical bounds on the potential benefit from RDMA and other low-overhead networking schemes as a function of key technology-independent ratios [57].
Web content delivery. My work on techniques for scalable Internet content caching and distribution led to an improved understanding of content-sharing protocols and the scale and performance of Internet information sharing. My early work with Misha Rabinovich on the CRISP project showed that the multicast probes then commonly used in distributed proxy caches were unscalable and increased miss costs. We proposed a directory-based approach to allow content sharing across a group of caching sites operating as a unified Internet object cache. Our research explored a continuum of directory management schemes suitable for collective caches of varying size and geographic scale [28,29,31,49], yielding one patent. Vicinity Cache [49] showed how to build highly scalable caches in which replicated directory information is propagated by gossip and degrades with distance. My subsequent work in this area explored the implications of heavy-tailed popularity distributions for ``supply-side'' content delivery networks [30] and for request distribution policies in Web server clusters [24].
Data-intensive computing. Trapeze was also the initial basis for productive interdisciplinary collaborations with algorithms researchers (Arge, Vitter, and Agarwal) working on massive-data algorithms and applications, primarily in spatial data domains such as Geographic Information Systems (GIS). These collaborations led to two large interdisciplinary NSF grants of which I was the Principal Investigator, combining experimental systems and algorithm engineering. In addition, I was co-PI (with Vitter) on an NSF ITR grant for research on active storage systems and algorithms. Some papers related to these grants include [42,9,61,2,1,3].
Executable code rewriting. Amber was an early system to use automatic executable code rewriting to ``glue'' applications to a system infrastructure for distributed data sharing. Following this theme, we developed the JOIE rewriting toolkit [21] for Java bytecode. JOIE was used in a number of research projects at Duke and elsewhere.
Next: Bibliography
Up: research07
Previous: Some Projects and Contributions
Jeff Chase
2008-12-06