We consider a participatory budgeting problem in which each voter submits a proposal for how to divide a single divisible resource (such as money or time) among several possible alternatives (such as public projects or activities) and these proposals must be aggregated into a single consensus division.
The era of information explosion has opened up an unprecedented opportunity to study the social, political, financial and medical events described in natural language text. While the past decades have seen significant progress in deep learning and natural language processing (NLP), it is still extremely difficult to analyze textual data at the event-level, e.g., to understand what is going on, what is the cause and impact, and how things will unfold over time.
Data-driven decision making plays a dominant role across all domains, from health, business, government, to sports. These data-driven decisions are often ad-hoc and resource-intensive: a bank has to compare and analyze all users, sporting events might use previous events to estimate an acceptable ticket sales rate. In this dissertation, I describe efficient methods for optimizing complex analytic queries.
Computational protein design is a transformative field with exciting prospects for advancing both basic science and translational medical research. New algorithms blend discrete and continuous mathematics to address the challenges of creating designer proteins. I will discuss recent progress in this area and some interesting open problems.
When do you allow for the evolution of existing software versus throwing it all away and starting over? This is a question that all software developers will face many times during their career, in both small and in large ways. Too often there is a strong urge to start over with a clean slate, getting rid of all the cruft that has built up over the years. Unfortunately the decision to succumb to this urge is taken with limited guidance, or clear reasoning.
Once upon a time, Computer Systems was a broad field encompassing everything from hardware to software. The incredible growth and success that our field has experienced over the past half a century has had the side effect of transforming systems into a constellation of siloed fields. I'm going to make the case that we should return to a broad interpretation of systems, undertake bolder, higher risk projects, and be intentional about how we interact with other fields. I'll support the case with examples of several research projects that embody this approach.
The Project Showcase is held annually by the Computer Science Department to highlight independent or team-based research and project work done during the academic year. The best projects in each category will be decided by faculty judges and announced towards the end of the event. Come explore the impressive efforts of our students and their advisors!
The “what ifs?” or ability to explore counterfactuals is central to the study of causal inference. Randomized control and A/B testing provides an approach to address this when counterfactuals can be experimented simultaneously. However, in a large number of scenarios such as policy evaluation, this is not feasible: we can’t have two Massachusetts, one having Gun Control and the other not at the same time, so that we can evaluate the impact of Gun Control on crime rate!
Integrating MNase-seq and RNA-seq Time Series Data to Study Dynamic Chromatin and Transcriptional Regulation Under Cadmium Stress
Though the sequence of the genome is essentially fixed, within each cell it exists in a complex and changing state, determined in part by the dynamic binding of proteins. These proteins—including nucleosomes, transcription factors (TFs), polymerases, and other complexes—define the living chromatin state of the genome. Understanding genome-wide how the dynamics of chromatin interact with the dynamics of transcriptional regulation remains a fundamental research problem.
The rapid development of Cloud computing and enterprise IT demand modern data centers to be less expensive and more efficient. With the improvement in software design such as concurrency control, caching and lease, we also need to exploit emerging network connection technologies to reduce some overhead. Remote Direct Memory Access (RDMA), a networking technology originally used in High-Performance Computing (HPC), is a trending technology for inter-node connection.
This project attempted to uncover hidden patterns in nanocomposite data. Natural Language Processing technique was used to analyze material science papers in order to discover the relationships between different material science terminologies. Also, this project tried to find the relationship between the glass-transition temperature (Tg) and other available features and the relationship between the shape parameter and other variables such as matrix type. Different models were used in this project such as LASSO regression, Support Vector Machine with Gaussian kernel and Decision Tree.
Recently convolutional neural networks (CNNs) have become popular for image recognition tasks due to their excellent performance compared to other earlier approaches. One limitation of CNNs however is that they require substantial quantities of hand-labeled training imagery compared to other models before they achieve their performance advantage. In this circumstance, multi-task learning (MTL) has been proposed, in which a single CNN is trained to perform several recognition tasks simultaneously.
Given statistics about a basketball game, we would like to generate interesting factlets about players' performances, e.g., "in NCAA tournament game on March 29, Rowan Barrett became the first player to have at least 11 assists in a game against Virginia Tech in Duke history." Such factlets are often used in media reporting and for fan engagement. Time is of essence for this application, yet finding all such claims is a time-consuming task. In this project, we use Apache Spark on Google Cloud to parallelize the analysis so it can be completed in a speedy and economical manner.
In this project, we optimize a system for automatically mining interesting "factlets" from basketball game statistics. After each Duke Men's basketball game, we examine individual players' performance in the context of all historical data and generate noteworthy statements such as "in the ACC tournament game vs.
With the high influx of computer science enrollment in universities in the last decade, there is increasing value and wide-reaching effects in improving pedagogy in the field. This improvement is especially useful in introductory computer science courses (CS1). Student experience in the first programming course is known to heavily influence students' desires to stay in the field.
Making complex decisions in areas like science, government policy, finance, and clinical treatments all require integrating and reasoning over disparate data sources. While some decisions can be made from a single source of information, others require considering multiple pieces of evidence and how they relate to one another.
We aim to create the highest possible quality of treatment-control matches for categorical data in the potential outcomes framework. The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; To match units on as many relevant variables as possible, the algorithm creates a hierarchy of covariate combinations on which to match (similar to downward closure), in the process solving an optimization problem for each unit in order to construct the optimal matches.
The insecurity of Internet services can lead to disastrous consequences – confidential communications can be monitored, financial information can be stolen, and our critical Internet infrastructure can be crippled. However, many prior works on Internet services only focus on the security of an individual network layer in isolation, whereas the adversaries do quite the opposite – they look for opportunities to exploit the interactions across heterogeneous components and layers to compromise the system security.
Discrete Optimization algorithms underlie intelligent decision-making in a wide variety of domains. From airline fleet scheduling to kidney exchanges and data center resource management, decisions are often modeled with binary on/off variables that are subject to operational and financial constraints.