Memory Systems & Massive Data Management Initiative

A common theme in the research of many faculty in the department is memory systems and massive data management. With advances in technology, massive amounts of data are becoming readily available at a relatively low cost. For example, experiments on CERN's accelerators, satellite images from NASA, CT-scans and MRI images of brains, and signals from vast sensor arrays are generating terabytes to petabytes of data every day. The current technology is not adequate to cope with such large amounts of data. Expertise in a wide range of topics including algorithmic techniques, database systems, computer architecture, distributed computing, and networking is needed to address these issues. The memory systems and the data management group in the department is unique because of its vertically integrated approach in addressing these issues – from an architectural point of view, to an algorithmic point of view, to a database management point of view.

At the architectural level, the goals are: (i) explore new processor and memory architectures to improve overall performance by addressing memory-system bottlenecks, and (ii) develop techniques for laying out data in memory and for restructuring program code to improve memory performance. We are exploring a variety of techniques to address fundamental issues that arise from continually changing technology and workloads. Our research efforts include high performance microprocessors, multithreaded systems, nanoarchitecture, dependable computing, and energy efficient computing.

At the algorithmic level, the goal is to develop algorithms and data structures: (i) that are specifically optimized for memory hierarchy efficiency and that minimize the access to secondary memory, (ii) that can handle long streams of data in real time, and (iii) that can provide tradeoff between efficiency and accuracy. I/O-efficiency, approximation, dynamization, and randomization are some of the recurring themes in our work.

The research in database management is broadly interested in data management systems and their applications. Our recent research focuses on maintaining derived data, which is used to facilitate the access of the base data. Derived data is obtained by applying structural or computational transformations to base data, and it must be updated whenever base data is updated. Explosive growth in the size and format of data raises a number of challenges in this area. In one of the projects we are designing derived data to handle the rich structure of XML, which is an emerging standard for data exchange over Internet and consists of semistructured graphs. In another project, we are developing efficient indexing schemes for multidimensional spatial data and for multidimensional data streams.

The group has been engaged in several multidisciplinary collaborative projects. In collaborative projects with the Pratt School of Engineering, it is experimenting with high density nanoscale architectures that will enhance the capability to access massive data and improve performance in high computational demand environments. A recent collaborative effort with the Medical School is geared toward developing tools for integrating data spaces for medical and clinical research. Another project in collaboration with the School of Environment focuses on terrain modeling and ecological forecasting.