Explore Data Science research areas at Duke Computer Science.
AI for social good
Researchers at Duke use the tools of artificial intelligence to assist with various important societal problems, including (but not limited to) healthcare, antibiotic and cancer resistance, criminal justice, detecting fake news, allocation of public resources to those who need them, environmental sustainability, energy reliability, and political districting. For many of these applications, it is essential that the system satisfy certain interpretability, transparency, morality and/or fairness conditions.
Databases
In the world of big data, people are collecting, storing, processing, and analyzing data on a regular basis. At Duke, we use data to solve many real-world problems with an emphasis on problems that impact social good, as well as aim to make this process as simple, efficient, robust, and secure as possible. Research in Duke data science and data management covers a wide range of topics:
- Data Science, which includes analyzing data in healthcare, criminal justice, fake news, sports data analysis, and other areas. Duke is particularly strong in methodology related to data science, including model interpretability, causal inference, and computer vision.
- (Mis-)Information Management, which includes automated fact checking, computational journalism, uncertain data management and probabilistic databases, data provenance, and explanations for query answers.
- Secure and Private Data Management, which includes differentially private data analysis with applications in areas like census, social networks, location tracking, and search logs.
- Data Processing, which includes query optimization, use of sampling and machine learning methods, and debugging and interactive exploration of query answers.
Interdisciplinary Research in Data Science
At Duke, we use data to solve many real-world problems, with an emphasis on problems that impact social good. This includes work in healthcare, criminal justice, fake news, and in other areas. Duke is particularly strong in methodology related to data science, including model interpretability, data privacy, causal inference, and computer vision. Much of this research spans multiple disciplines and is collaborative in nature. In particular, we collaborate with researchers in statistics, mathematics, electrical and computer engineering, economics, public policy, law, biology, medicine, and more.
Machine Learning
Machine learning algorithms allow computers to learn automatically from data to perform complicated tasks in vision, natural language processing and many other fields. Research at Duke addresses both theoretical and practical aspects of machine learning. In particular, researchers at Duke have made significant contributions in learning interpretable models, non-convex optimization and theoretical understanding of neural networks.
Security & Privacy
In this era of big data, the privacy of individuals and security of computing systems that handle sensitive data has come to be a central challenge in computer science. At Duke, research in this area has focused on four broad directions:
- Differentially Private Data Science, where researchers at Duke have made fundamental contributions to the theory, algorithms, programming frameworks and systems, and social implications of differential privacy.
- Privacy in Mobile Systems, where the focus at Duke is to study novel architectures for enabling privacy in mobile and smart devices as well as privacy enhancing techniques for sharing sensor data (e.g., camera and location) with potentially malicious applications.
- Oblivious and Secure Computation, where at Duke the goal is to advance the theory and application of cryptographic primitives with the aim of building efficient and practical systems for specific problem domains like graph computation and differentially private analytics.
- Blockchains, where Duke researchers are making foundational contributions to the field of distributed consensus to scale blockchains in a way that achieves robustness without sacrificing low latency and high throughput.