CPS 370 - FALL 1998


	Fall 1998		Using the Web to Solve Crossword Puzzles

Web page for the final project.
Status of our next-to-last solver: postscript.

Introduction

Regular meeting time: Wednesdays 3pm-4pm D243 (LSRC), working group meeting Wednesdays 4pm-5pm D209.

For the past two years, the American Association for Artificial Intelligence (AAAI) has sponsored a "Hall of Champions" in which the world's best programs play the world's best humans in games such as backgammon, chess, Othello, Scrabble, poker, bridge, checkers, and go. In this seminar, we will try to put together the first-ever program to compete with human beings in tournament crossword-puzzle solving. If successful, we will exhibit the program at the Hall of Champions for 1999. Unlike the other games mentioned, crossword puzzles are AI-complete---that is, solving arbitrary crosswords with human-level performance is as hard as the general human-level-intelligence problem (e.g., the Turing test). As such, our crossword solver will need a tremendous database of lexical and cultural knowledge in machine-readable form---just the sort of thing we can get from the web. We will study and perhaps extend ideas from a variety of disciplines:

AI statistical language processing: computing word-meaning similarities
computational linguistics: parsing clues
AI search and constrained optimization: filling in the grid
linear algebra: statistical derivation of lexical information
I/O efficient algorithms: manipulating the huge text database

We welcome individuals from all backgrounds and at all levels of training---there are lots of interesting experiments to run. We will hold weekly discussions to explore related areas of computer science and AI and to make progress reports. We will also have a number of top-notch invited speakers from inside and outside Duke. Students taking the course for credit will be required to run one seminar session and participate in the programming effort. Auditors are welcome to kibitz and participate in design and brainstorming sessions.

Interested undergrads, please send me email---we can arrange for credit via an independent study. First-year graduate students are also very welcome (in fact, a main goal of the seminar is to give first-year students exposure to AI research at Duke).

Evolving Syllabus

9/2/98: Organizational meeting. Michael L. leads discussion on motivation and system architecture. Handout: Games Magazine article on AAAI 1997 Hall of Champions, and tournament rules. Brainstorming about experiments. Next meeting time. Essay on why computers can't play crosswords. Give out tournament for 1983. Notes (complements of Sushant).
9/9/98: Greg is going to present the results with the current solver along with some statistics he's gathered on the difficulty of the problem. We'll then brainstorm a bit about what statistics we need to collect first, and divide up these tasks. Greg will also summarize some of the resources that we have that we haven't used yet.
9/16/98: Introduction to Constraint Satisfaction. Discuss tournament scores. Papers are: CSP entry in AI encyclopedia and Ginsberg's solver.
9/23/98: Video of Tom Landauer's talk on Latent Semantic Analysis covering its use in representing human-like knowledge.
9/30/98: More on Landauer's talk. Discuss existing crossword programs.
10/7/98: Update on current solver. The mathematics of crossword optimization. Read a chapter from Manning and Schuetze.
10/14/98: Kautz visits. State-of-the-art satisfiability solvers for crossword puzzles.
10/21/98: Part of Speech tagging and Transformation-Based Learning. From Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schuetze (forthcoming).
10/28/98: Topics in Information retrieval. From Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schuetze (forthcoming).
11/4/98: Boosting for IR (Shannon, Sushant).
11/11/98: Probabilitistic models of IR: Bayesian (Joe), A Language Modeling Approach (Jason), A Linguistically Motivated Model (Fan).
11/20/98: A dynamic programming algorithm for segmenting long clues based on observed letter probabilities (Greg).
11/25/98: Thanksgiving.
12/2/98: Connections between CSP and belief nets: Bucket Elimination, Mini Buckets (Greg, Noam, and Karl: The Three Kludges).
12/9/98: Review of probabilistic derivations. Writeups due. Tentative freeze date.
12/14/98: Final 2pm-5pm.

References

Minda Zetlin. Thinking machines. Games, pages 12--15,61, April 1998.

Will Shortz. Introduction to The New York Times Daily Crossword Puzzles, volume 47, Random House, 1997.

Will Shortz (editor). American Championship Crosswords, Fawcett Columbine, 1990.

Dechter survey

CSP Survey, Dynamic Backtracking

Ginsberg, M. L., Frank, M., Halpin, M. P., & Torrance, M. C. (1990). Search lessons learned from crossword puzzles. In Proceedings of the Eighth National Conference on Artificial Intelligence, pp. 210-215.

Landauer, T. K., Laham, D., & Foltz, P. W., (1998). Learning human-like knowledge by Singular Value Decomposition: A progress report. In M. I. Jordan, M. J. Kearns & S. A. Solla (Eds.), Advances in Neural Information Processing Systems 10, (pp. 45-51). Cambridge: MIT Press.

Chris Manning and Hinrich Schuetze (forthcoming). Foundations of Statistical Natural Language Processing, MIT Press.

Robert Schapire, Yoram Singer, Amit Singhal (to appear). Boosting and Rocchio Applied to Text Filtering. In ACM SIGIR '98.

Amit Singhal, Chris Buckley, Mandar Mitra (1996). Pivoted Document Length Normalization. In ACM SIGIR '96, (pp. 21--29).

Michelle Keim and David D. Lewis and David Madigan. Bayesian Information Retrieval: Preliminary Evaluation. In Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 303-310, 1997.

Jay M. Ponte and W. Bruce Croft. A Language Modeling Approach to Information Retrieval. In Proceedings of SIGIR 98, pp. 275-281.

Djoerd Hiemstra. A Linguistically Motivated Probabilistic Model of Information Retrieval. In Christos Nicolaou and Constantine Stephanidis (eds.), Proceedings of the second European Conference on Research and Advanced Technology for Digital Libraries: ECDL'98, Springer-Verlag, pages 569-584, 1998.

Rina Dechter. Bucket Elimination: A Unifying Framework For Structure-driven Inference. To appear in the volume in "Learning and Inference in Graphical Models."

Rina Dechter and Irina Rish. Mini-buckets: a general scheme for approximating inference. Appears as an ICS Technical Report, October, 1998.

Seminar Organizers

Michael L. Littman

Office: D209 LSRC
Phone: 660-6537
Email: mlittman@cs.duke.edu

Greg A. Keim

Office: D230 LSRC
Phone: 660-6516
Email: keim@cs.duke.edu

Last modified: Wed Aug 19 09:46:48 EDT 1998 by Michael Littman, mlittman@cs.duke.edu

Introduction

Evolving Syllabus

Other Topics

References

Seminar Organizers