Project Schedule and Guidelines
See this document for the project schedule
Guidelines for the project proposal:
Project proposal is due on Oct 5, by noon. The proposal will be graded
and should include (i) a description of the problem, (ii) the motivation
for the problem (e.g., why is the problem interesting, why is
it challenging, who will benefit from a solution to the problem,
etc.), (iii) your initial ideas on how to attack the
problem, and (iv) a brief discussion of previous work related to this problem.
There is no page limit for the proposal.
Here are some further readings for each of the project topics. To
access some of the following links (e.g., papers in the ACM
digital library), you need to be on the Duke Network.
Query Optimization in Database Systems
Guy Lohman's talk on Self-Managing DB2
with an overview of their recent work on query optimization.
The Picasso project and a
As we discussed in class, the goals of query optimization have changed
over the years. Here is a paper
on robust query optimization.
The following paper is the first technical paper on the LEO system
that Volker Markl talked about.
Michael Stillger, Guy M. Lohman, Volker Markl, Mokhtar Kandil: LEO -
DB2's LEarning Optimizer. Available
A new and improved version of this paper is available
A less technical, but more forward looking paper, on the LEO
project appeared in the
IBM Systems Journal. Available
Adaptive Query Processing in Database Systems
A recent paper on changing query plans if a problem is detected when
a query is running:
Volker Markl, Vijayshankar Raman, David E. Simmen, Guy M. Lohman,
Hamid Pirahesh: Robust Query Processing through Progressive
An attempt by Shivnath and colleagues
to correct some problems with the above approach:
Query Execution in Database Systems
A paper on Interaction-Aware Query Processing and Scheduling.
A paper on query suspension and resumption.
A paper on estimating time to completion of a query plan.
Data Stream Systems
Two recent projects on building data stream management systems:
Here are two overview papers: from STREAM and
Adaptive query processing in a data stream management system. Shivnath's
slides on adaptive query processing and an
Work on load shedding which gracefully tackles high stream arrival rates
by reducing the accuracy of query results:
Configuration of Database Systems: Physical Design (e.g., Indexes and
Configuration of Database Systems: Resources and Configuration Parameters
from IBM on automated configuration of application servers.
A paper on our project at Duke on
Active and Accelerated
Learning of Cost Models for Optimizing Scientific Applications; with
extensions to web services, database servers, storage servers, etc.
IBM DB2's Configuration Advisor.
Databases + Information Retrieval (DB+IR)
A paper on Google's system architecture. The paper is outdated, but the
basic principles remain.
Some papers from IBM on the DB+IR problem:
Self-Healing Database Systems
A paper from Oracle on quick identification of
Work from IBM on automated scheduling of statistics updates for DB2:
2 (a non-technical
A paper from IBM on identifying distinct symptoms for
different causes of DB2 failures.
Here are installation instructions for DB2
on the Duke CS research cluster.
Some useful information on running DB2 on Duke CS research clusters is available from CPS116 web site