Jeffrey S. Chase

Duke University
Department of Computer Science

chase@cs.duke.edu

 

Information about current research projects and lab facilities can be found through the Network/Internet Computing Lab (NICL) and the Internet Systems and Storage Group (ISSG) site. This page summarizes my research projects as of April 2002. I gratefully acknowledge support for this research from the National Science Foundation, Network Appliance, Hewlett-Packard, Cisco Systems, IBM, Myricom, and Intel Corporation.

Network Storage

As network connectivity grows, storage is increasingly consolidated in network-based storage services to simplify sharing and management. Unfortunately, the Grail of fast, boundlessly scalable, dependable, self-managing network storage is still beyond our reach.

We have several ongoing projects in this area in the ISSG and the subsidiary Network Storage Lab. The Slice project focuses on translucent virtualization for network-attached storage protocols (e.g., NFS). Slice enables construction of robust, scalable virtual storage appliances by aggregating modular server and storage components using redirecting network switches. We are developing new tools (fstress) to characterize the performance of Slice and other network-attached storage architectures. We are also experimenting with the emerging Direct Access File System (DAFS) as a basis for low-overhead network file services over direct-access networks. Finally, we are exploring active storage models for data-intensive systems, in which computing power is integrated into lower levels of the memory/storage hierarchy. Our earlier research in network storage focused on network/OS interface issues, cooperative network memory, file service support for transactional clients, and I/O prefetching.

Utility Computing

Scalable automation of large-scale network services is a key challenge for computing systems research in the next decade. We are investigating software frameworks and policies to manage network services as utilities whose resources are automatically provisioned and sold according to demand, much as electricity is today. The physical resources to deliver these services reside in data centers and edge sites throughout the Internet. Our work is directed at a software infrastructure that continuously adapts the resource assignments for each service to respond to request load, quality-of-service targets (Service Level Agreements), and network conditions.

We are continuing to extend our prototype for Muse, an operating system for a data center. In its current form, Muse allocates resources in a server ensemble across competing Web-based services. Muse also adjusts service placement and the active set of on-power servers to save energy and survive partial failures of the power supply or thermal systems in a data center. In related work, called Cluster On Demand (COD), we are developing tools to automatically apply user-defined software configurations to blocks of servers allocated as virtual clusters, enabling a utility to host diverse applications on a common hardware base. These efforts are key elements of the Opus project, which addresses utility computing for decentralized services that coordinate their activities across multiple sites.

 
Internet Service Infrastructures

Internet applications are increasingly dynamand decentralized, and a growing number of software applications are hosted as network services accessed from diverse client devices. The key to making these application services reliable, fast, and scalable is to distribute and replicate their functions and data across networks of servers. The mapping of functions and data onto servers must be fluid to respond to changing conditions, particularly in a utility setting.

We are exploring service infrastructures to support decentralized services. An important theme is the use of network-level redirection to route requests to selected servers; redirecting switches are a key element of the Slice and Muse projects. We are exploring a redirection architecture called Anypoint to support a general class of virtualized services using IP-based transports. Anypoint switches host service-specific Application-Layer Routing Modules (ALRMs) that define the policy for content-based request routing, as a manageable form of the Active Networks idea. Our research addresses transport (L4) and service structuring issues for service virtualization using these advanced redirecting switches. In a related project we have prototyped a new Web server called DASH that performs resource control for Web-based services at the user level, without relying on special OS kernel support. DASH is based on a user-level DAFS file system client that enables an event-driven Web server with full control over caching and data movement.

Another project called Ivory addresses automatic state management for Java-based Web services, using Java bytecode transformation as an enabling technology. Ivory's code transformers are based on the JOIE toolkit for on-the-fly transformation, developed by my student Geoff Cohen for his PhD dissertation.

Evaluation methodologies are a critical challenge for research in large-scale service infrastructures. In the Modelnet project we are constructing an emulation environment for wide-area distributed systems, allowing us to run the system as a "ship in a bottle" over a cluster of nodes interconnected through other nodes that emulate the Internet core.

End System Networking

Server-centric computing requires fast, scalable network communication within the data center and across peer sites. In 1995 we launched the Trapeze project to investigate network interface (NIC) techniques and OS structures to harness the potential of emerging high-speed networks to meet this demand. The premise of the project was that rapidly advancing network speeds would promote a shift toward network I/O, exposing fundamental OS structuring issues.

Our work has involved firmware programming in Myrinet and Tigon Gigabit Ethernet NICs to experiment with NIC design and NIC/host interfaces. The project established new performance standards for page transfer latency and network I/O bandwidth in server clusters, earned a patent for self-tuning NIC features to balance transfer latency and bandwidth, and produced open-source OS extensions now in wide use, including high-speed network drivers, FreeBSD/Alpha platform support, and FreeBSD enhancements for low-overhead networking with NFS and TCP/IP. We studied the effects of a range of NIC and OS features on several platforms, demonstrating their behavior at speeds up to 2 Gb/s. This was reported in the media as the highest point-to-point Internet bandwidth on public record.

With the emergence of Ethernet networks at gigabit and 10-gigabit speeds there is renewed interest in advanced NICs for IP-based networking. I am active in the IETF ROI working group to define standards to enable NIC assist for low-overhead networking with standard IP transports using Remote Direct Memory Access (RDMA). In our research with the Direct Access File System we are experimenting with new NICs that support RDMA and related features for direct-access networking, primarily IP/Ethernet NICs supporting the Virtual Interface Architecture (VI) host interface.

 
Data-Intensive Computing

Storage is an active research area in part due to the explosion of new data, such as genomics/proteomics data and earth sciences data from remote sensing satellites and ground-based sensor networks. In the Geo* project I am collaborating with algorithms researchers in the Center for Geometric Computing and environmental scientists from the Nicholas School of the Environment, who are working on massive-data algorithms and applications, primarily in spatial data domains such as Geographic Information Systems (GIS).

Our research related to these collaborations focuses on improving memory and I/O performance for massive-data algorithms and their implementations. The techniques include application-appropriate buffering, prefetching, caching and data placement in cooperative storage hierarchies including the Slice storage service and the Direct Access File System, and active storage systems that incorporate processing power into the storage nodes. The TerraFlow watershed modeling package is one recent product of these collaborations. We increasingly focus on wide-area sharing of data repositories, a problem that has much in common with my recent research on Internet services and content delivery. This research is a key element of the Department's 1999 NSF Research Infrastructure grant for Data-Intensive Computing with Spatial Models, of which I am the PI. The RI grant funds much of the infrastructure in the ISSG lab and Algorithmics lab.

 
Web Caching and Content Distribution

The popularity of Web-based content and services exposes familiar issues of scale and performance for distributed information sharing in the context of the global Internet. This raises the question of how to decentralize content storage and service sites to improve the performance, scalability, and reliability of Internet services.

My early research in this area investigated techniques for scalable Internet caching and distribution of static Web content. This work was an outgrowth of the CRISP project, a collaboration with Misha Rabinovich at AT&T Research. CRISP explored a continuum of directory management schemes suitable for collective caches of varying size and geographic scale, yielding one patent. Vicinity Cache showed how to build highly scalable caches in which replicated directory information is propagated by gossip and degrades with distance. CRISP led to an improved understanding of content-sharing protocols and their effectiveness in large caching systems, including more recent peer-to-peer data sharing systems and ``grid storage''. Today, our work related to Web caching is increasingly focused on automated management of caching resources and replication in Internet service utilities, primarily for services with dynamic content.