Managing server resources in hosting centers. Our SOSP 2001 paper established the foundations for adaptive resource provisioning in utility data centers [16]. It described an operating system--called Muse--for a Web hosting center, and it was the first work to envision operating system functions at the scale of an entire building containing multiple servers and other cluster components such as redirecting network switches. Muse continuously monitors incoming request traffic and server load levels for multiple hosted services, and dynamically adjusts the allocated server resources to match the load. The resource allocation policy is based on a simple economic formulation that seeks to maximize global benefit (center profit) as defined by utility functions that capture value for a given level of delivered performance to each sevice. Muse incorporates a simple model-based utility-maximizing optimizer within a global feedback loop. It addressed the core issues of feedback control in this new setting: filtering instrumentation data gathered from servers and switches, calibrating responses to real-world load swings to balance stability and agility.
Muse was also the first system to explore the role of dynamic provisioning for energy management in hosting centers. It could respond to an unexpected power or cooling ``browndown'' by allocating scarce resources to their highest and best use, or reduce power consumption during periods of light load, holding service quality constant. The issue of server energy and thermal management--now a critical concern--was just beginning to be recognized. The Muse paper demonstrated the potential of coordinated cluster-level energy management without requiring sophisticated energy management features in each server.
Muse and the Slice project (described below) also explored the role of reconfigurable network switches to direct the flow of requests to the server set for each service [12]. We developed this idea further in the Anypoint project [63]) and the dissertation research of doctoral student Ken Yocum.
Networked utilities. We next began to consider how a utility operating system could manage a network of sites hosting widely distributed services. We wrote a concept paper on a hypothetical system (Opus) that exposed core issues for a large-scale network utility [10]: balancing local autonomy with global coordination, optimizing for multiple service quality objectives including consistency and availability, control based on approximate global state, adapting service placement to request locality, and configuring multiple service overlay networks from a pool of shared network resources. Opus set the context for my subsequent work in this area in my collaboration with my colleague Amin Vahdat, who is now at UCSD.
Opus influenced the PlanetLab initiative, which has since created a network utility testbed spanning hundreds of Internet sites. PlanetLab is widely used in the network systems community, and hosts services (such as content distribution) with large user communites. I was engaged with the PlanetLab effort at an early stage. In various research projects I have investigated the relationship of the resource virtualization approach adopted in Planetlab with resource management approaches used in grid computing middleware (e.g., [52,50]). My research on networked computing utilities continues in the Orca project.
Secure resource peering and accountability. Our 2003 SOSP paper on Secure Highly Available Resource Peering (SHARP) addressed the challenge of managing a PlanetLab-like networked utility as a collection of autonomous sites without central control [27]. SHARP laid the groundwork for an extensible, federated resource economy with accountable contracts. The key to SHARP is a mechanism for cryptographically secure and accountable delegation of control over resources to another actor, such as a broker or a manager for a hosted service, for a bounded period of time (analogous to hierarchical leasing). It addresses key challenges for networked utilities: preserving availability of resources when the actors controlling them fail, decentralized trust management, enforcement of contracts, and reconciling local autonomy with global coordination. SHARP shows how accountable delegation enables a flexible and extensible brokering architecture, making it possible to combine an evolving set of policies for distributed resource management within the utility. We built a SHARP prototype for PlanetLab.
We are currently exploring economic resource management in a broker community in the Cereus project [34]. Cereus shows how to use the SHARP primitives to implement an accountable virtual currency suitable as a generic medium of exchange to allocate community resources to community members using market principles.
Accountability We are also considering the broader role of accountability in distributed systems that cross multiple trust domains, including critical infrastructure control. A system is accountable if it provides a means to detect and expose misbehavior by its participants. A system is strongly accountable if it provides a means for each participant to determine for itself if others are behaving correctly, without trusting assertions of misbehavior by another participant who may itself be compromised. Accountability provides powerful incentives to promote cooperation and discourage malicious and incorrect behavior [68,66]. In recent work we developed a toolkit for strongly accountable network services called CATS--Certified Authenticated Tamper-Evident State Store--and showed how to use it to develop a strongly accountable network storage service [67].
Green data centers. The Muse project led to the first investigation of dynamic thermal management for data centers, in collaboration with Partha Ranganathan and others at HP Labs [53,44,47]. We observe that power and cooling costs make up a significant share of operating costs for a data center, and that vulnerability to failures of cooling equipment increases with server density. When the center is not saturated, dynamic mapping of workload to servers (as in Muse) can play an important role by balancing thermal load across the data center, and placing load where the cooling system is best able to handle it. We have investigated several approaches to inferring the ``thermal topology'' of the data center from continuous sensor monitoring [46,45]. We have developed policies for temperature-aware workload placement and evaluated them using computational fluid dynamics (CFD) simulations with fault injection.
Our most sophisticated policies can yield up to a factor of two reduction in annual data center cooling costs for representative workloads. This work on ``green'' data centers was the dissertation research of doctoral student Justin Moore, and it received notice in MIT's Technology Review. Partha was honored as a TR-35 Young Innovator for 2007.
With Partha and David Irwin I have also explored approaches to allocate power in dense blade server systems in which the aggregate burst capability of the hardware modules exceeds the power/thermal budget for the combined system enclosure [51].
Self-managing systems. The dynamic feedback-controlled structure of our systems has built awareness of fundamental techniques for self-managing or ``autonomic'' systems. We are investigating a range of techniques to infer application profiles or application models from passive observations (e.g., of network traffic) and sensor streams, and to use those models as a basis for automated management of storage and CPU resources and multiple tiers; the first example of this approach is [23].
During my 2003-2004 sabbatical at HP Labs I worked with colleagues there on automatic configuration of storage systems to balance of cost and dependability, using an off-the-shelf optimization solver [40]. With another group I investigated statistical induction techniques to recognize system states that correlate with service agreement violations or failures in a multi-tier service [22], using a restricted form of Bayesian network classifier. This approach embodies a rudimentary form of automatic diagnosis: it associates these states with specific combinations of metrics that suggest repair actions.
More recently, I have worked with my Duke colleague Shivnath Babu and our doctoral student Piyush Shivam to induce application performance models from execution histories and instrumentation data. In particular, we have explored proactive approaches to speed learning by perturbing the resources available to each execution and observing the effect on performance. Several publications deal with this research [54,56,55].