While state-of-the-art machine learning models are deep, large-scale, sequential and highly nonconvex, the backbone of modern learning algorithms are simple algorithms such as stochastic gradient descent, or Q-learning (in the case of reinforcement learning tasks). A basic question endures---why do simple algorithms work so well even in these challenging settings?
Upcoming Colloquia Events
As the proliferation of sensors rapidly make the Internet-of-Things (IoT) a reality, the devices and sensors in this ecosystem—such as smartphones, video cameras, home automation systems and autonomous vehicles—constantly map out the real-world producing unprecedented amounts of connected data that captures complex and diverse relations. Unfortunately, existing big data processing and machine learning frameworks are ill-suited for analyzing such dynamic connected data, and face several challenges when employed for this purpose.
Natural language processing (NLP) has come of age. For example, semantic role labeling (SRL), which automatically annotates sentences with a labeled graph representing “who” did “what” to “whom,” has in the past ten years seen nearly 40% reduction in error, bringing it to useful accuracy. As a result, a myriad of practitioners now want to deploy NLP systems on billions of documents across many domains. However, state-of-the-art NLP systems are typically not optimized for cross-domain robustness nor computational efficiency.
How can we intelligently acquire information for decision making, when facing a large volume of data? In this talk, I will focus on learning and decision making problems that arise in robotics, scientific discovery and human-centered systems, and present how we can develop principled approaches that actively extract information, identify the most relevant data for the learning tasks and make effective decisions under uncertainty.
In this talk, I will overview our recent progress towards understanding how we learn large capacity machine learning models. In the modern practice of machine learning, especially deep learning, many successful models have far more trainable parameters compared to the number of training examples. Consequently, the optimization objective for training such models have multiple minimizers that perfectly fit the training data.
Learning from interaction with the environment -- trying untested actions, observing successes and failures, and tying effects back to causes -- is one of the first capabilities thought of when considering intelligent agents. Reinforcement learning is the area of artificial intelligence research that has the goal of allowing autonomous agents to learn in this way. Despite many recent empirical successes, most modern reinforcement learning algorithms are still limited by the large amounts of experience required before useful skills are learned.
Cloud computing plays a critical role in providing computing resources to many organizations. The relentless of the need for cloud service makes reliability and efficiency two primary metrics of interest. However, the existing data center system design falls short on these two goals. Specifically, (1) operating systems have significant overheads in providing virtualization support to cloud applications; (2) network infrastructure incurs excessive cost; (3) infrastructure problems are notoriously difficult to debug and mitigate.
Correctness and security problems in modern computer systems can result from problematic hardware event orderings and interleavings during an application’s execution.
Neural predictive coding is an enormously successful approach to unsupervised representation learning in natural language processing. In this approach, a large-scale neural language model is trained to predict the missing signal (e.g., next word, next sentence) and the trained model is used in downstream tasks to produce useful text representations. While effective, it is computationally difficult to work with and yields uninterpretable representations.
Online transaction processing (OLTP) is critical for applications including finance, e-commerce, social networks, and healthcare. The increasing performance demands of these applications require OLTP to scale massively. Concurrency control is a major scalability bottleneck in such systems.
Discrete Optimization algorithms underlie intelligent decision-making in a wide variety of domains. From airline fleet scheduling to kidney exchanges and data center resource management, decisions are often modeled with binary on/off variables that are subject to operational and financial constraints.