Algorithms play an increasing role in informing human decisions. We consider two classes of problems where this is the case. First, we discuss the design of algorithms for shared ownership. We extend the standard fair division framework to allow resources to be public, meaning that multiple agents can benefit from them simultaneously, and problems to be online, rather than one-shot. Second, we consider the problem of eliciting information for probabilistic forecasting.
End of Moore's Law Challenges and Opportunities: Computer Architecture Perspectives for the Post-ISA Era
For decades, Moore’s Law and its partner Dennard Scaling have driven technology trends that have enabled exponential performance improvements in computer systems at manageable power dissipation. With the slowing of Moore/Dennard improvements, designers have turned to a range of approaches for extending scaling of computer systems performance and power efficiency. Unfortunately, these scaling gains come at the expense of degraded hardware-software abstraction layers, increased complexity at the hardware-software interface, and increased challenges for software relia
Reasoning with equations is a central part of mathematics. Typically we think of solving equations but another role they play is to define algebraic structures like groups or vector spaces. Equational logic was formalized and developed by Birkhoff in the 1930s and led to a subject called universal algebra. Universal algebra was used in formalizing concepts of data types in computer science. In this talk I will present a quantitative analogue of equational logic: we write expressions like s =_ε t with the intended interpretation "s is within ε of t".
Every field has data. We use data to discover new knowledge, to interpret the world, to make decisions, and even to predict the future. The recent convergence of big data, cloud computing, and novel machine learning algorithms and statistical methods is causing an explosive interest in data science and its applicability to all fields. This convergence has already enabled the automation of some tasks that better human performance. The novel capabilities we derive from data science will drive our cars, treat disease, and keep us safe.
The social sciences are crucial for deciding billions in spending, and yet are often starved for data and badly underserved by modern computational tools. Building data-intensive systems for social science workloads holds the promise of enabling exciting discoveries in both computational and domain-specific fields, while also making an outsized real-world impact.
Deep learning has had enormous success on perceptual tasks but still struggles in providing a model for inference. To address this gap, we have been developing Memory-Attention-Composition networks (MACnets). The MACnet design provides a strong prior for explicitly iterative reasoning, enabling it to support explainable, structured learning, as well as good generalization from a modest amount of data. The model builds on the great success of existing recurrent cells such as LSTMs: A MacNet is a sequence of a single recurrent Memory, Attention, and Composition (MAC) cell.
Data Centers have emerged as one of the dominant forms of cloud computing. However, there are several interesting (new) questions related to scheduling that arise. In this survey talk, we discuss several problems related to scheduling in data centers. The talk covers job scheduling problems related to utilization efficiency of VMs, along with questions dealing with basic communication issues that arise when multiple competing applications are running. We will also briefly discuss questions related to scheduling on multiple-data centers.
Users are faced with an increasing onslaught of data, whether it's in their choices of movies to watch, assimilating data from multiple sources, or finding information relevant to their lives on open data registries. In this talk I discuss some of the recent and ongoing work about how to improve understanding and exploration of such data, particularly by users with little database background.
Many modern data-intensive computational problems either require, or benefit from distance or similarity data that adhere to a metric. The algorithms run faster or have better performance guarantees. Unfortunately, in real applications, the data are messy and values are noisy. The distances between the data points are far from satisfying a metric. Indeed, there are a number of different algorithms for finding the closest set of distances to the given ones that also satisfy a metric (sometimes with the extra condition of being Euclidean).
We present an algorithm for creating high resolution anatomically plausible images that are consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large databases of clinical images contain a wealth of information, medical acquisition constraints result in sparse scans that miss much of the anatomy. These characteristics often render computational analysis impractical as standard processing algorithms tend to fail when applied to such images.
In many settings, learning algorithms are deployed "in the wild", where their output is used before the classifier has been fully trained. In this online context, a natural model of interaction is Angluin's (1988) Equivalence Query model: the algorithm proposes an output classifier; the user responds with either the feedback that the classifier is correct, or otherwise provides the algorithm with a point that is mislabeled. The algorithm takes this feedback into account in producing a new, potentially very different, classifier, and the process repeats.
Basketball is always a hot topic in Duke. With the duke man basketball data, we aim to find the highlights and interesting facts of a game and let fans share them on the social network easily. Our ultimate goal of this project is to tweet the most interesting facts about a game automatically right after it finishes. In order to achieve that goal, we analyze the performances to find interesting claims efficiently, rank them by impressiveness, and ensure our final tweets' diversity.
Research in vehicular networks has intensified due to their promise of reducing vehicle collisions, providing driver assistance, and increasing fuel economy. A main component of vehicular networks is Vehicle-to-Vehicle (V2V) communication, allowing vehicles to share their status information by frequently broadcasting their state such as current speed or brake pressure. Sharing status information will allow other drivers to become aware of their location and immediate intent, but not necessarily their intended route or future behavior.
Dimensional reduction is indispensable to clustering and categorization of high-dimensional data. The t-Distributed Stochastic Neighbor Embedding (t-SNE) focuses on preserving the stochastic neighbor relationship, while diffusion kernel embedding aims to preserve the local and global geometric information. Both algorithms are followed by classification of data in dimension-reduced spaces.
Given their low operating costs and flight capabilities, Unmanned Aircraft Vehicles(UAVs), especially small size UAVs, have a wide range of applications, from civilian rescue missions to military surveillance. Easy control from a highly automated system has made these compact UAVs particularly efficient and effective devices by alleviating human operator workload. However, whether or not automation can lead to increased performance is not just a matter of system design but requires operators’ thorough understanding of the behavior of the system.
What if I told you I had bulletproof evidence of a serious threat to American national security – a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. The accuracy of my numbers is more conclusive than any CIA or FBI intelligence on international or domestic terrorist activity. This is the question before us today – but the threat doesn’t come from terrorists. The threat comes from climate change and air pollution.
PageRank is arguably the most popular graph ranking algorithm which has been used to analyze large networks.
This thesis presents and compares different iteration methods of PageRank computation with network analysis, and describes new methods for computing Personalized PageRank with various damping factor.
First, matrix analysis on each of the network graphs is conducted. Dulmage-Mendelsohn decomposition is used to achieve block-wise upper triangular structure, and Spectral node reordering technique is applied for accelerating computation.
Intelligent robots require advanced vision capabilities to perceive and interact with the real physical world. While computer vision has made great strides in recent years, its predominant paradigm still focuses on analyzing image pixels to infer 2D output representations (bounding boxes, segmentations, etc.), which remain far from sufficient for real-world robotics applications.