We're on TV! Well, actually the seminar is now being held over a
video link between UNC and Duke. The rooms are 08 Peabody at UNC and
North Building 130a at Duke. Time: 2pm-3:20pm.
Evolving Syllabus
9/3/97: Organizational meeting. Michael L. leads discussion on
applications and algorithms background.
9/19/97: Greg leads "An introduction to hidden Markov
models" (discussion on the Viterbi algorithm for part-of-speech
tagging and introduction to HMMs).
11/14/97: Fan leads "A statistical approach to machine
translation" and "Aligning sentences in parallel corpora". Papers not
available on line. Meet in D243 at Duke. NO VIDEO LINK ON THIS DATE!
Researchers creating practical systems that manipulate human languages
are turning more frequently to statistical or corpus-based approaches.
The goal of this seminar is familiarize participants with some of the
applications and techniques that define this emerging and exciting
area.
Sample natural-language applications include:
Syntax: Part-of-speech tagging, parsing, language modeling,
prepositional-phrase attachment, spelling and grammar correction,
word segmentation, term and name identification, morphological
analysis
Participants will read and discuss research papers from recent
journals and conference proceedings.
Papers to Discuss
Full References
Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike
Perkowitz. Equations
for part-of-speech tagging. In Proceedings of the Eleventh
National Conference on Artificial Intelligence, Menlo Park: AAAI
Press/MIT Press (1993) 784-789.
L.R. Rabiner and B.H. Juang, "An Introduction to Hidden Markov
Models", IEEE ASSP Magazine, Jan., 1986, pp. 4-16.
Eugene Charniak, Glenn Carroll, John Adcock, Anthony Cassandra,
Yoshihiko Gotoh, Jeremy Katz, Michael Littman and John McCann. Taggers
for parsers. Artificial Intelligence, 85 (1--2): 45--57,
August, 1996.
Adwait Ratnaparkhi. A maximum
entropy part-of-speech tagger. In Proceedings of the Empirical
Methods in Natural Language Processing Conference, May 17-18,
1996.
Steven Abney. Statistical
methods and linguistics. In: Judith Klavans and Philip Resnik
(eds.), The Balancing Act. The MIT Press, Cambridge, MA, pages 2-26
(Chapter 1), 1996.
Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della
Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and
Paul S. Roossin. A statistical approach to machine translation.
Computational Linguistics, Volume 16, Number 2, June
1990.
Peter F. Brown, Jennifer C. Lai, and Robert L. Mercer. Aligning
sentences in parallel corpora. In Proceedings of the Conference of
the Association of Computational Linguistics, Berkeley,
pp. 169-176, 1991.
Hinrich Schütze and Jan O. Pedersen. Information
retrieval based on word senses. In Fourth Annual Symposium
on Document Analysis and Information Retrieval, pages 161-175,
Las Vegas NV, 1995.
Other References
Thomas K. Landauer and Michael L. Littman. Fully automatic
cross-language document retrieval using latent semantic
indexing. In Proceedings of the Sixth Annual Conference of the
UW Centre for the New Oxford English Dictionary and Text
Research, pp. 31-38. UW Centre for the New OED and Text Research,
Waterloo Ontario, October 1990.
D. Beeferman, A. Berger, and J. Lafferty. Text
segmentation using exponential models. In Proceedings of the
Second Conference On Empirical Methods in NLP, Providence, RI,
1997.
Eric Sven Ristad and Robert G. Thomas. New techniques for
context modeling. In Proceedings of the 33rd Annual Meeting
of the ACL, Cambridge, MA, June 27-30, 1995.
Kenneth W. Church. One Term Or Two? Proceedings of the 18th Annual
International Conference on Research and Development in Information
Retrieval (SIGIR'95) (SIGIR95). Seattle, WA, USA, 1995. 310-318.
Gale, W., K. Church, and D. Yarowsky. ``A Method for Disambiguating
Word Senses in a Large Corpus.'' Computers and the Humanities. 26,
pp. 415-439, 1992.
M. E. Lesk, `` Automatic Sense Disambiguation Using Machine Readable
Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone,''
Proc. 1986 SIGDOC Conference, Toronto, Ontario, June, 1986
Eugene Charniak, 1997 AAAI best paper. Charniak and Goldman, Bayes
Nets in story understanding.
Church's tagging stuff. And Good Turing.
Brown corpus.
Peter Brown et al. Word sense disambiguation using statistical
methods. Also Class-based n-gram models of natural language. And the
classic automatic translation work.
Ido Dagan, Alon Itai. Word sense disambiguation using a second
language monolingual corpus.
Stuff in The Computation and
Language E-Print Archive: ``Cue Phrase Classification Using
Machine Learning'' (Litman), ``Stochastic Attribute-Value Grammars''
(Abney), ``Learning string edit distance'' (Ristad and Yianilos),
``Unsupervised Language Acquisition'' (de Marcken), ``Nonuniform
Markov models'' (Ristad and Thomas), ``Comparative Experiments on
Disambiguating Word Senses: An Illustration of the Role of Bias in
Machine Learning'' (Mooney), ``Hybrid language processing in the
Spoken Language Translator'' (Rayner and Carter), ``Automatic
Extraction of Subcategorization from Corpora'' (Briscoe and Carroll),
``Fast Statistical Parsing of Noun Phrases for Document Indexing''
(Zhai), ``A Maximum Entropy Approach to Identifying Sentence
Boundaries'' (Reynar and Ratnaparkhi), ``Machine Transliteration''
(Knight and Grahel), ``Sense Tagging: Semantic Tagging with a
Lexicon'' (Wilks and Stevenson), ``A Corpus-Based Approach for
Building Semantic Lexicons'' (Riloff and Shepherd), ``Mistake-Driven
Learning in Text Categorization'' (Dagan, Karov, and Roth),
``Distinguishing Word Senses in Untagged Text'' (Pedersen and Bruce),
``A Linear Observed Time Statistical Parser Based on Maximum Entropy
Models'' (Ratnaparkhi).
Title: Combining Multiple Methods for the Automatic Construction of
Multilingual WordNets, Authors: Jordi Atserias (Universitat
Politecnica de Catalunya), Salvador Climent (Universitat de
Barcelona), Xavier Farreres,German Rigau (Universitat Politecnica de
Catalunya), Horacio Rodriguez (Universitat Politecnica de Catalunya)
Comments: 7 pages, 4 postscript figures Journal-ref: EACL/ACL 97
Madrid pages 48-55, Paper: cmp-lg/9709003
Brown TRs: CS-94-07 Eugene Charniak and Glenn Carroll,
``Context-Sensitive Statistics for Improved Grammatical Language
Models'', CS-94-08 Glenn Carroll and Eugene Charniak, ``Combining
Grammars For Improved Learning''