|
|
Fall 1998
|
|
Using the Web to Solve Crossword Puzzles
|
Introduction
Regular meeting time: Wednesdays 3pm-4pm D243 (LSRC), working group
meeting Wednesdays 4pm-5pm D209.
For the past two years, the American Association for Artificial
Intelligence (AAAI) has sponsored a "Hall of Champions" in which the
world's best programs play the world's best humans in games such as
backgammon, chess, Othello, Scrabble, poker, bridge, checkers, and go.
In this seminar, we will try to put together the first-ever program to
compete with human beings in tournament crossword-puzzle solving. If
successful, we will exhibit the program at the Hall of Champions for
1999.
Unlike the other games mentioned, crossword puzzles are
AI-complete---that is, solving arbitrary crosswords with human-level
performance is as hard as the general human-level-intelligence problem
(e.g., the Turing test). As such, our crossword solver will need a
tremendous database of lexical and cultural knowledge in
machine-readable form---just the sort of thing we can get from the
web.
We will study and perhaps extend ideas from a variety of disciplines:
- AI statistical language processing: computing word-meaning similarities
- computational linguistics: parsing clues
- AI search and constrained optimization: filling in the grid
- linear algebra: statistical derivation of lexical information
- I/O efficient algorithms: manipulating the huge text database
We welcome individuals from all backgrounds and at all levels of
training---there are lots of interesting experiments to run. We will
hold weekly discussions to explore related areas of computer science
and AI and to make progress reports. We will also have a number of
top-notch invited speakers from inside and outside Duke. Students
taking the course for credit will be required to run one seminar
session and participate in the programming effort. Auditors are
welcome to kibitz and participate in design and brainstorming
sessions.
Interested undergrads, please send me email---we can arrange for
credit via an independent study. First-year graduate students are
also very welcome (in fact, a main goal of the seminar is to give
first-year students exposure to AI research at Duke).
Evolving Syllabus
- 9/2/98: Organizational meeting. Michael L. leads discussion on
motivation and system architecture. Handout: Games Magazine article
on
AAAI
1997 Hall of Champions, and
tournament rules.
Brainstorming about experiments. Next meeting time. Essay on why
computers can't play crosswords. Give out tournament for 1983.
Notes (complements of Sushant).
- 9/9/98: Greg is going to present the results with the current
solver along with some statistics he's gathered on the difficulty of
the problem. We'll then brainstorm a bit about what statistics we
need to collect first, and divide up these tasks. Greg will also
summarize some of the resources that we have that we haven't used
yet.
- 9/16/98: Introduction to Constraint Satisfaction. Discuss
tournament scores. Papers are: CSP entry in AI encyclopedia and
Ginsberg's solver.
- 9/23/98: Video of Tom Landauer's talk on
Latent Semantic Analysis
covering its use in
representing
human-like knowledge.
- 9/30/98: More on Landauer's talk. Discuss existing crossword
programs.
- 10/7/98: Update on current solver. The mathematics of crossword
optimization. Read a chapter from Manning and Schuetze.
- 10/14/98: Kautz visits. State-of-the-art satisfiability solvers
for crossword puzzles.
- 10/21/98: Part of Speech tagging and Transformation-Based
Learning. From Foundations of Statistical Natural Language
Processing, Chris Manning and Hinrich Schuetze (forthcoming).
- 10/28/98: Topics in Information retrieval. From Foundations
of Statistical Natural Language Processing, Chris Manning and
Hinrich Schuetze (forthcoming).
- 11/4/98: Boosting
for IR (Shannon, Sushant).
- 11/11/98: Probabilitistic models of IR: Bayesian
(Joe), A
Language Modeling Approach (Jason), A
Linguistically Motivated Model (Fan).
- 11/20/98: A dynamic programming algorithm for segmenting long
clues based on observed letter probabilities (Greg).
- 11/25/98: Thanksgiving.
- 12/2/98: Connections between CSP and belief nets:
Bucket
Elimination,
Mini
Buckets (Greg, Noam, and Karl: The Three Kludges).
- 12/9/98: Review of probabilistic derivations. Writeups due.
Tentative freeze date.
- 12/14/98: Final 2pm-5pm.
Other Topics
- Algorithms for inference in belief nets.
- Machine learning and cross validation.
- Discussion of A*.
- Crossword Maestro
- Other papers listed in the Crossword FAQ
- More links: TV guide's
puzzles, Tournament
page, Cruciverb-L, Biography on
Shortz
- Links to databases: Amazon
(books), More
crossword resources, Framenet
(encoding of verbs), All Music (music
database), Miriam-Webster
Dictionary, Cliches,
Longman
dictionary, Lyrics, English word list with
other info, Online Crossword
Dictionary, Crossword dictionary
on CD-ROM, links to other
DBs (strategy, rhyming, cliche, etc.), Booklist, Internet Movie Database, CIA
World Factbook
References
Minda Zetlin. Thinking machines. Games, pages 12--15,61,
April 1998.
Will Shortz. Introduction to The New York Times Daily Crossword
Puzzles, volume 47, Random House, 1997.
Will Shortz (editor). American Championship Crosswords, Fawcett
Columbine, 1990.
Dechter
survey
CSP Survey, Dynamic
Backtracking
Ginsberg, M. L., Frank, M., Halpin, M. P., & Torrance,
M. C. (1990). Search
lessons learned from crossword puzzles. In Proceedings of the
Eighth National Conference on Artificial Intelligence, pp. 210-215.
Landauer, T. K., Laham, D., & Foltz, P. W., (1998). Learning human-like
knowledge by Singular Value Decomposition: A progress report. In
M. I. Jordan, M. J. Kearns & S. A. Solla (Eds.), Advances in Neural
Information Processing Systems 10, (pp. 45-51). Cambridge: MIT
Press.
Chris Manning and Hinrich Schuetze (forthcoming). Foundations of
Statistical Natural Language Processing, MIT Press.
Robert Schapire, Yoram Singer, Amit Singhal (to appear). Boosting
and Rocchio Applied to Text Filtering. In ACM SIGIR
'98.
Amit Singhal, Chris Buckley, Mandar Mitra (1996). Pivoted
Document Length Normalization. In ACM SIGIR '96,
(pp. 21--29).
Michelle Keim and David D. Lewis and David Madigan. Bayesian
Information Retrieval: Preliminary Evaluation. In Preliminary
Papers of the Sixth International Workshop on Artificial Intelligence
and Statistics, pages 303-310, 1997.
Jay M. Ponte and W. Bruce Croft. A Language
Modeling Approach to Information Retrieval. In Proceedings of
SIGIR 98, pp. 275-281.
Djoerd Hiemstra. A
Linguistically Motivated Probabilistic Model of Information
Retrieval. In Christos Nicolaou and Constantine Stephanidis
(eds.), Proceedings of the second European Conference on Research
and Advanced Technology for Digital Libraries: ECDL'98,
Springer-Verlag, pages 569-584, 1998.
Rina Dechter. Bucket
Elimination: A Unifying Framework For Structure-driven Inference.
To appear in the volume in "Learning and Inference in Graphical
Models."
Rina Dechter and Irina Rish. Mini-buckets:
a general scheme for approximating inference. Appears as an ICS
Technical Report, October, 1998.
Seminar Organizers
Michael L. Littman
Greg A. Keim
Last modified: Wed Aug 19 09:46:48 EDT 1998
by Michael Littman, mlittman@cs.duke.edu