Building knowledge bases for natural language understanding

Duke Computer Science Colloquium
Speaker Name
Derry Wijaya
Date and Time
-
Location
LSRC D106
Notes
Lunch served at 11:45 am
Abstract

Intelligent systems that are capable of understanding natural languages can have many applications from healthcare to business to law. One of the ways we can formulate natural language understanding is by treating it as a task of mapping natural language text to its meaning representation: entities and relations anchored to the world. Knowledge bases (KBs) can facilitate natural language understanding by mapping words to their meaning representations, for example nouns to entities and verbs to relations. State of the art knowledge bases such as NELL, Freebase, and YAGO have been successful at constructing such knowledge bases, which contain beliefs about real world entities and relations, by leveraging the redundancy of millions of documents to detect language patterns. The accumulated knowledge have been used to improve the ability of intelligent systems to make inferences. Under multilingual and multimodal settings, knowledge bases present a virtuous learning opportunity: more and higher confident beliefs can be extracted by processing data in more languages or modalities; in turn, since entities and their relations in the KBs exist in the world no matter what language or modality is being used to express them, KBs can act as interlingua for relating corpora in different languages and modalities through KB entities and relations. This is especially useful for low resource languages where there are few if any aligned bilingual texts to support effective natural language processing (NLP) tasks such as machine translation or cross-lingual disambiguation. In this talk, I will elaborate on this virtuous circle, starting with building knowledge bases that map verbs to real world relations, followed by results on using knowledge bases for translating words from monolingual only corpora.

Short Biography

Derry Wijaya is a postdoctoral researcher at University of Pennsylvania. Her research interests include machine learning, natural language processing, and data mining. She works with Professor Chris Callison-Burch on using machine learning to build computer systems that intelligently process and understand human languages particularly under low resource and multilingual settings. She received her Ph.D. from Carnegie Mellon University working with Professor Tom Mitchell on the Never Ending Language Learning (NELL) project, and her MSc and Bachelor or Computing from National University of Singapore.

Host
Hai (Helen) Li