Neural Network Architectures for Fast and Robust NLP
Natural language processing (NLP) has come of age. For example, semantic role labeling (SRL), which automatically annotates sentences with a labeled graph representing “who” did “what” to “whom,” has in the past ten years seen nearly 40% reduction in error, bringing it to useful accuracy. As a result, a myriad of practitioners now want to deploy NLP systems on billions of documents across many domains. However, state-of-the-art NLP systems are typically not optimized for cross-domain robustness nor computational efficiency.
In this talk I will present two new methods to facilitate fast and robust NLP. First, I will describe Iterated Dilated Convolutional Neural Networks (ID-CNNs, EMNLP 2017), a faster alternative to bidirectional LSTMs for sequence labeling, which in comparison to traditional CNNs have better capacity for large context and structured prediction. Through a distinct combination of network structure, parameter sharing and training procedures, ID-CNNs enable dramatic 14-20x test-time speedups while retaining accuracy comparable to the Bi-LSTM-CRF. Second, I will present Linguistically-Informed Self-Attention (LISA, EMNLP 2018 Best Long Paper), a neural network model that combines multi-head self-attention with multi-task learning across four related tasks. Unlike previous models which require significant pre-processing to prepare syntactic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform part-of-speech tagging, syntactic parsing, predicate detection and semantic role labeling for all predicates. Syntax is incorporated through the attention mechanism, which is trained to focus on syntactic parents for each token. We show that integrating linguistic structure in this way leads to substantial improvements over the previous state-of-the-art (syntax-free) neural network models for SRL, especially when evaluating out-of-domain, where LISA obtains nearly 10% reduction in error while also providing speed advantages. I will conclude by discussing my plans for future work, which will enable a wide array of practitioners to efficiently and robustly derive meaning from text to facilitate citizen engagement, fairness, and social change.
Emma Strubell is a final-year PhD candidate in the College of Information and Computer Sciences at UMass Amherst, advised by Andrew McCallum. Her research aims to provide fast and robust natural language processing to the diversity of academic and industrial investigators eager to pull insight and decision support from massive text data in many domains. Toward this end she works at the intersection of natural language understanding, machine learning, and deep learning methods cognizant of modern tensor processing hardware. She has applied her methods to scientific knowledge bases in collaboration with the Chan Zuckerberg Initiative, and to advanced materials synthesis in collaboration with faculty at MIT. Emma has interned as a research scientist at Amazon and Google and received the IBM PhD Fellowship Award. She is also an active advocate for women in computer science, serving as leader of the UMass CS Women’s group where she co-organized and won grants to support cross-cultural peer mentoring, conference travel grants for women, and technical workshops. Her research has been recognized with best paper awards at ACL 2015 and EMNLP 2018.