Duke Database Research Group | |||||||||||||||
|
Welcome to Duke Database Research Group! We are broadly interested in database and information systems as well as their applications. This group was established in 2001, and is currently supported by National Science Foundation, National Institute of Health, Duke University, and IBM Corporation. Our recent research focuses on derived data maintenance. Derived data is the result of applying some transformation, structural or computational, to base data. The use of derived data to facilitate access to base data is a recurring technique in many areas of computer science. Used in hardware and software caches, derived data speeds up access to base data. Used in replicated systems, it improves reliability and performance of applications in a wide-area network. Used as index structures, it provides fast alternative access paths to base data. Used as materialized views in databases or data warehouses, it improves the performance of complex queries over base data. Used as synopses, it provides fast, approximate answers to queries or statistics needed for cost-based optimization. Derived data may vary in complexity: it can be a simple copy of base data, in the cases of caching and replication, or it can be the result of complex transformations, in the cases of indexes and materialized views. Derived data may also vary in accuracy: caches and materialized views are usually exact, while synopses are approximate. Regardless of the varying forms, purposes, complexity, and accuracy of derived data, it must be maintained when base data is updated. Thus, derived data maintenance is a fundamental problem in computer science. It is also an evolving problem: existing techniques are constantly challenged by the explosive growth in data volume and number of data producers and consumers, and by increasing diversity in data formats. Traditionally, derived data maintenance has been tackled separately in different contexts, e.g., index updates and materialized view maintenance in databases, cache coherence and replication protocols in distributed systems. Although they share the same underlying theme, these techniques have been developed and applied largely disjointly. Newer and more complex data management tasks, however, call for creative combinations of the traditionally separate ideas. We are actively investigating techniques and applications of derived data maintenance in the following contexts:
In addition to work on derived data maintenance, we have tackled a number of other problems, such as bibliographic data extraction and cleansing, incorporating Web search techniques in relational databases, temporal database implementation, query optimization over heterogeneous data sources, etc. Please refer to our publications for details.
|
||||||||||||||
| Last updated Mon Aug 21 14:27:49 EDT 2006 | |||||||||||||||