Flex: A Platform for Experiment-Driven System Management


Summary         People         Papers         Demos        

Project Summary

Despite a number of recent efforts, current solutions for system-administration tasks like benchmarking, tuning, troubleshooting, and capacity-planning remain far from satisfactory. Consider an example scenario where a database administrator (DBA) notices a slowdown of the production database due to some unknown cause. The DBA may collect some monitoring data on the production database in an attempt to diagnose the problem. However, data collection can increase the load on an already under-performing database; forcing the DBA to shift to the test database. The DBA's usual course of action would be:

Note that the above process required the DBA to do a number of experiments. Each experiment involved setting up the system in a desired configuration, running a specific workload, and collecting instrumentation data for analysis. Experiments were used (i) to better understand the problem, (ii) during the search process for finding the fix, and (iii) for validating that an accurate and stable fix has been found. We call the overall process an instance of experiment-driven management.

Experiment-driven management is an important piece of the system administration puzzle that has largely been left untouched by researchers; until now. The Flex project is our attempt to automate experiment-driven management and bring its benefits to several long-standing problems in databases as well as other systems. More details of Flex's vision can be found in this overview talk or our HotOS 2009 paper.

Flex is supported generously by NSF, startup funds from Duke, and three faculty awards from IBM.

Current Project Members

Collaborators

Alumni

Publications

Overall Vision

On the Experimentation Workbench

On Parameter Tuning

On SQL Tuning

On Query Interactions

On Diagnosis

On Benchmarking and Modeling

On Control- and System-level Issues

Demonstrations