.eX: A Platform for Experiment-Driven System Management
Project Summary
Despite a number of recent efforts, current solutions for system-administration tasks like benchmarking, tuning, troubleshooting, and
capacity-planning remain far from satisfactory.
Consider an example scenario where a database administrator (DBA) notices a slowdown of the production database due to some unknown cause. The DBA
may collect some monitoring data on the production database in an attempt to diagnose the problem. However, data collection can increase the load on
an already under-performing database; forcing the DBA to shift to the test database. The DBA's usual course of action would be:
- Create a replica of the production environment
on the test database.
- Get more insight into system behavior by performing runs of
the production workload on the test database, and collecting
instrumentation data. Multiple runs may be required because of system
variability.
- Form hypotheses regarding potential causes of the
performance problem. Do further runs under different system
configurations to refine or confirm these hypotheses. For example,
new indexes, statistics about the data, or resources may be added;
hints may be given to the database query optimizer to force it to
choose specific query execution plans; database configuration
parameter settings may be changed; and so on.
- When a fix is found, possibly after much trial and error, a
careful validation is done to ensure that the fix will work on the
production system. Validation may require multiple runs
to test correctness and stability.
Note that the above process required the DBA to do a number of
experiments. Each experiment involved setting up the system in a
desired configuration, running a specific workload, and collecting
instrumentation data for analysis. Experiments were used (i) to better
understand the problem, (ii) during the search process for finding the fix,
and (iii) for validating that an accurate and stable fix has been found.
We call the overall process an instance of experiment-driven
management.
Experiment-driven management is an important piece of the system administration puzzle
that has largely been left untouched by researchers; until
now. The .eX project (pronounced dotex project, like .NET) is our attempt to automate experiment-driven management and bring its
benefits to several long-standing problems in databases as well as other systems. More details of .eX's vision can be found in our
HotOS 2009 paper
or HotAC 2008 paper.
.eX is supported generously by NSF, startup
funds from Duke, and three faculty awards from IBM.
Current Project Members
-
Shivnath Babu, Assistant Professor, Duke Computer Science
-
Nedyalko Borisov, Ph.D. Candidate, Duke Computer Science
-
Songyun Duan, Ph.D. Candidate, Duke Computer Science
-
Herodotos Herodotou, Ph.D. Candidate, Duke Computer Science
-
Vamsidhar Thummala, Ph.D. Candidate, Duke Computer Science
Collaborators
-
Prof. Ashraf Aboulnaga, University of Waterloo
-
Mumtaz Ahmad, University of Waterloo
-
Prof. Jeff Chase, Duke University
Publications
Overall Vision
On Parameter Tuning
On SQL Tuning
On Query Interactions
On Diagnosis
On Benchmarking and Modeling
-
P. Shivam, V. Marupadi, J. Chase, and S. Babu.
Cutting Corners: Workbench Automation for Server Benchmarking
In Proc. of the 2008 USENIX Annual Technical Conference,
June 2008
- P. Shivam, S. Babu, and J. Chase.
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications
In Proc. of the International Conference on Very Large Databases (VLDB), September 2006
- P. Shivam, S. Babu, and J. Chase.
Active Sampling for Accelerated Learning of Performance Models
In Proc. of the First Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML), June 2006
On Control- and System-level Issues
- A. Demberel, J. Chase, and S. Babu.
Reflective Control for an Elastic Cloud Application: An Automated Experiment Workbench
In Proc. of the First Workshop on
Hot Topics in Cloud Computing (HotCloud), in conjunction with USENIX Annual Technical Conference, June 2009
- A. Yumerefendi, P. Shivam, D. Irwin, P. Gunda,
L. Grit, A. Demberel, J. Chase, and S. Babu.
Towards an Autonomic Computing Testbed
In Workshop
on Hot Topics in Autonomic Computing (HotAC), June 2007
Demonstrations
- S. Duan, P. Franklin, V. Thummala, D. Zhao, and S. Babu.
Shaman: A Self-Healing Database System
Demonstrated at the
2009 IEEE International Conference on Data Engineering (ICDE), April 2009
- P. Shivam, A. Demberel, P. Gunda, D. Irwin,
L. Grit, A. Yumerefendi, S. Babu, and J.
Chase.
Automated and On-Demand Provisioning of Virtual Machines for Database Applications
Demonstrated at the
2007 ACM Intl. Conf. on Management of Data (SIGMOD 2007), June 2007