Compsci 100, Spring 2009, Boggle®

You should snarf assignment boggle from http://www.cs.duke.edu/courses/spring09/cps100/snarf or browse the code directory for code files provided. See the Boggle howto file for help and more instructions.

After you snarf the assignment, move the .txt files (bogwords.txt and ospd3.txt) into the bin directory -- otherwise the code won't find them when it loads the lexicons. To do this use the Window>Showview>Navigator menu sequence to get to the Navigator view. Then grab the .txt files with your mouse to move them into the bin directory.

Genesis of Assignment

Prof. Astrachan fondly recalls neighborhood Boggle® games from his now long-ago childhood and was re-introduced to it on Sun Unix workstations as a grad student in the mid 80's. We used Boggle® for a Compsci 108 assignment at Duke in 1996. Boggle is in the Nifty Assignments Archive and the current instantiation of the assignment is a combination of efforts from Compsci educators at Duke, Stanford, and UCSD. This version emphasizes empirical analyses of tradeoffs in different implementations.

Boggle Wordgames and Background
wikipedia jpg The Wikipedia Boggle entry explains the game in detail including popular culture references to Boggle. Reviewing the rules on the Wikipedia site will help in understanding the play-of-the-game, but you'll be looking at implementation trade-offs that are independent of game-playing. You can play games based on words that are similar to Boggle online, e.g., Facebook's Scramble and Yahoo! Word Racer both of which is definitely related to Boggle.

Theoretical and practical approaches to word games have formed the basis for The World's Fastest Scrabble Program (Appel and Jacobson) for recognizing Genomic Signatures in DeBruijn Chains (Heath and Pati) and for An efficient interface between continuous speech recognition and language understanding (Oerder and Ney).

Analyzing alternative implementations and algorithms will help inform a better understanding of both work and play, what a great way to build a foundation in understanding how to turn data into information.

In particular you'll be examining alternate implementations of a lexicon or word source. You'll work on a Trie implementation and a compressed Trie that's similar to a Radix Trie/Patricia Trie, but simpler to implement. Tries are the data structure behind predictive text used in cell-phone texting, e.g., in T9 Text Input and have formed the basis for IP-routing, e.g., see Keith Sklower's Tree-Based Packet Routing Table for Berkeley Unix although changes to IPv6 may change the utility of tries as described here: Scalable high speed IP routing lookups (Waldvogel, Varghese, Turner, Plattner).

You'll use different word-sources or lexicons. Apparently there's a controversy as to what the "best" lexicon is for playing word games like Scrabble® and Boggle®

What You Will Do

You'll code a class to find a specific word on a Boggle® board and you'll implement two hierarchies of classes. Each hierarchy will be a group of classes that implement a common interface and allow you to reason empirically about trade-offs in implementation techniques. You'll run code/experiments to reason about the classes you write.

  1. Find Words On Board

    You'll design and code one class that implements the IWordOnBoardFinder interface to find a word on a Boggle board. You'll use this class in implementing one of the auto-player classes and in determining where a word appears on a board when you or another (human) player plays the game.

  2. Implement and benchmark lexicons

    You'll write classes that implement the ILexicon interface. Each lexicon implementation facilitates looking up a word or a possible prefix of a word. The classes differ in how efficiently they support these operations and in how much memory they use.

  3. Implement two AutoPlayers that find all words quickly

    You'll write classes that implement the IAutoPlayer interface. Each class facilitates playing a game of Boggle by finding every possible word on a given Boggle board. These classes use the lexicons from the previous hierarchy.

  4. Empirical Analyses

    You'll make conclusions based on empirical, runtime analyses of the classes you write to determine which of several methods is "best", and to reason in general about the trade-offs in different implementations.


Find Words on the Board

You're given a class BadWordOnBoardFinder that implements the IWordOnBoardFinder interface. The bad word finder returns an empty list of BoardCell objects for any query so no words will be found.

You'll implement the method cellsForWord in a class GoodWordOnBoardFinder using a standard, backtracking search to find a word on a Boggle board.

Details and more specifics for this part of the assignment are provided in the howto pages in the section on the IWordOnBoardFinder interface. Those pages also include hints on how to write the class described here.


Implementing Lexicons

Details about the classes you write for this part of the assignment and help in writing them are provided in the howto pages for implementing lexicons.

You'll design and code three classes for this part of the assignment and you'll analyze their performance empirically. You're given a JUnit testing class, TestLexicon, to test your implementations --- see the howto pages for help/reminders on using JUnit. In the load methods you write you can assume no duplicate words will be inserted via either the Scanner or ArrayList parameters to the load methods. You're also given a class LexiconBenchmark you can use to analyze the empirical performance of your implementations.

In Boggle®, legal words are those found in the lexicon associated with a game. A lexicon is simply a list of words, it's not a dictionary because it doesn't have associated definitions for each word.

The ILexicon interface specifies the methods exported by the interface which implementations must provide. You're given an implementation SimpleLexicon with an O(n) implementation of the method wordStatus since the implementation simply does a linear search over the list of words it stores. Read the documentation for details of the interface. You must write two new, more efficient implementations and modify SimpleLexicon so that it's wordStatus method executes more efficiently.


Implementing AutoPlayers

Details about the classes you write for this part of the assignment and help in writing them are provided in the howto pages for implementing AutoPlayers.

You'll implement two classes that find all the words on a Boggle board. Each class uses a different technique and you'll analyze the runtime tradeoffs in these techniques. You'll also reason empirically about the performance of these two classes when they're configured with different implementations of ILexicon lexicons.

One class, BoardFirstAutoPlayer looks at the board and tries to form words by trying all the paths on the board. The code will be very similar to the backtracking code you wrote for the GoodWordOnBoardFinder class, but you'll prune searches early based on prefixes as described in the howto pages. Another class LexiconFirstAutoPlayer will be relatively simple to implement since you'll have working lexicons and a working WordOnBoardFinder from earlier in this assignment. In implementing the LexiconFirstAutoPlayer you'll write code to look up every word in the lexicon to see if it's on the board. This method is surprisingly fast enough for a game of Boggle , but it's probably not fast enough to run 10,000 times without waiting for a while.

Statistical and Empirical Analyses of Boggle

You need to report in your README file information about which Lexicon implementation is fastest. You should compare all the lexicons and report on their relative times --- you shouldn't simply say "this one is fastest". You should have at least three lexicons to test and four if you do the extra credit. You should explain the methods you used to determine this and report on experiments you run, giving times.

You must write code to play lots of auto-games, not games in which humans play, but games in which all words are found on thousands of boards --- see BoggleStats for a starting point. You must provide a board that scores the highest of all the 4x4 and 5x5 boards you test in running at least 10,000 auto-games. and preferably 50,000 games. Report on how many seconds it takes your code to run for 1,000 games; for 10,000 games (or predict that if you can't run that many); and predict/justify on how much time it would take your computer and implementation to simulate both 100,000 games and one million games. When doing the experiments be sure to set the random-number generator's seed, currently done already in BoggleStats and described in the howto pages. If you can't run 10,000 auto-games, indicate the maximum number you can run.


Grading

This assignment is worth 55 points with 10 extra points available; the breakdown is as follows:

functionality points
GoodWordOnBoardFinder 10
SimpleLexicon 5
BinarySearchLexicon 5
CompressedTrieLexicon 10 (extra)
LexiconFirstAuto 5
BoardFirstAuto 10
README/Stats 10000 game 10
README/Lexicon analysis 10

Your README file should list your testing results, all the people with whom you collaborated, and the TAs/UTAs you consulted with. You should include an estimate of how long you spent on the program and what your thoughts are about the assignment.

Submit your README and all of your source code using Eclipse with assignment name boggle.