New Internet Computing Lab (NICL)
NICL explores system software infrastructures for the New Internet.
We focus on hosting platforms for network services, including network
computing and network storage.
In particular, our research deals with
networked utility
systems that share and provision a virtualizable
resource substrate (computers, network elements, storage) on demand for a wide
range of networked services. Hosted services (guests) can include computational
grids and
cloud computing middleware and also
new kinds of networked systems that run the substrate resources under their control in entirely
different ways. We proposed the notion of computing on a "server cloud" in talks in
2001-2002 (
Servers in the Mist [PDF]) and it has been
a cornerstone of our research agenda ever since.
A key research challenge is to build autonomic services
and cloud control systems that can manage themselves and hold human
administrative burdens constant as the system scales.
The real promise of networked cloud computing is to enable dynamic, adaptive
network application services
that can deploy wherever resources are available and demand exists,
and can sense-and-respond to adapt automatically
to changes in traffic demands or
resource conditions.
If we are successful, utility substrates and control services
will evolve into an open public infrastructure requiring
flexible, secure, robust, and decentralized control.
The control architecture must resolve the
"tussle" of contending demands, changing priorities, and rapidly
advancing technology, all within the framework of a self-sustaining
system.
Projects
- Orca: Open Resource Control Architecture. Orca is a
control plane architecture for an Internet operating system based on a foundational abstraction of resource
leasing. Orca is also an open-source
software release incorporating the latest software from several earlier and ongoing projects:
- Shirako is a resource leasing toolkit implemented in Java. It defines
Java interfaces to incorporate new resources and services into an Orca control plane.
- Cluster-On-Demand (COD) is a
site manager for reconfigurable mixed-use
clusters. COD guests use Shirako leasing interfaces to allocate and configure virtual clusters.
- Automat is an interactive Web-based laboratory for autonomic
services and data centers. It includes a Web portal for deploying and monitoring controller
modules that manage dynamic resource provisioning for Shirako/COD sites and the services deployed on
them. It also provides a test harness for subjecting services and controllers to various workloads
and faultloads.
See the paper from Hot Topics in Autonomic Computing (HOTAC 2007) [PDF].
- SHARP is an architecture for secure highly available resource peering. SHARP defined
an initial form of the leasing abstraction used in Orca, and secure
delegation mechanisms to exchange leases as accountable contracts among self-interested actors.
- Virtual grids and cloud computing middleware. The Orca software is a platform for diverse guest
environments that schedule workloads on "clouds" of infrastructure resources. These include job
management services, virtual desktop services, and middleware for data-intensive cluster computing,
such as Hadoop/MapReduce. We have run several such systems as pluggable "guest packages" above the
guest-neutral Orca platform: Globus, Sun GridEngine (SGE), Hadoop, and a VM-based job execution
service called JAWS. We are exploring how to configure guests on demand, share resources among multiple
guest instances on a common substrate, and adapt resource assignments as conditions change. We also
collaborate with the Virtual Workspaces project.
- Statistical learning for performance control, diagnosis, and repair. NIMO (NonInvasive Modeling for Optimization) investigates
statistical approaches to learn models of application behavior by observing guest applications under various
conditions. These models can serve as a basis for effective and efficient resource management
and fault diagnosis in networked utilities. We are also exploring approaches
workbench automation that use active learning and proactive design
of experiments to learn such models quickly.
- CATS (Certified Authenticated Tamper-Evident Services) investigates fundamental structures
for accountable network services in which misbehavior by a participant is detectable by
other participants and provable to a third party. Accountability is important for any network service with
multiple self-interested actors, including Orca systems with contract-based resource sharing.
- Economic resource exchange. Cereus investigates market-based
mechanisms with virtual currencies as a basis for allocating resources in a shared substrate.
Such schemes may be implemented as
collections of controller modules in an Orca system that supports accountable contracts.
- Green Computing. A key advantage of flexible compute clouds is
the ability to manage server resources to save energy. Beginning in 2001, we wrote a series of papers on
adaptive thermal/power management for data centers,
most in a collaboration with recent PhD Justin Moore (now at Google)
and Partha Ranganathan of HP Labs,
who was named one of the world's top young innovators in 2007.
People
Faculty and Staff
Collaboratoring faculty at Duke:
Staff:
Graduate Students
Recently graduated PhD students:
Funding
We are grateful to the National Science Foundation for our major funding. NSF has funded our work most recently through:
- CNS-0720829, CSR-VCM: Foundations for a Programmable Self-Managing Hosting Center, with Shivnath Babu.
- CNS-0509408, CSR-AES: Virtual Playgrounds: Making Virtual Distributed Computing Real, with Kate Keahey and Ian Foster.
- ANI 03-30658, NMI: Collaborative Research: A Grid Service for Dynamic Virtual Clusters
We are also grateful to IBM Corporation for funding Orca-related research through several Faculty Awards and
SUR equipment grants from 2003-2008. HP and Network Appliance have also funded predecessors of the project.