Submit using markov as the assignment name. Submit a README, all .java files, and the Analysis.pdf.
MarkovModelfaster (by writing a new model, not changing the existing model), you must implement the
WordNgramclass and then you must use that to create a new, word-based Markov Model.
After you snarf the assignment you should run MarkovMain to use the brute-force Markov generator. The GUI associated with the program is shown below.
You use the File menu to browse and select a data file as the training text. There is a data directory provided when you snarf containing several files you can use as training texts when developing your program.
On the left is a screen shot of the program just after alice.txt has been loaded. On the right is a screen shot after generating 300 random characters using an order-4 Markov model with Alice in Wonderland as the training text.
MapMarkovModelthat extends the abstract class AbstractModel class -- use the existing MarkovModel class to get ideas for your new class. In particular, the communication between the GUI/View and the model works as follows from the model class perspective:
initializemethod of the model is called with a Scanner from which characters and words are read. The code you have already reads all the characters into a string and uses this string to generate characters "at random", but based on their frequency in an order-k Markov model.
processmethod is called when the user presses the GO button or the return/enter key in the text field of the GUI. The convention in this program is that the Object passed to
processis a String that contains space-separated numbers representing the order k of the Markov model and the number of letters to generate (or number of words). See the existing code for examples.
messageViewswith a String, or
showViewsErrorto with a String, respectively. You can either use
super.messageViewssince the abstract class from which your model inherits contains these methods.
super.updatewith a String. The
updatemethod in the views displays a string. To clear the output in the views, call
clear). To display multiple lines, either construct one string to pass to
updaterepeatedly without clearing.
In the code
you're given in MarkovModel
the call below sends the built-up randomly-generated String to
the views --- see method
brute in the code. The
StringBuilder for efficiency in constructing the
character-by-character randomly-generated text.
You can modify MarkovMain to use your model by simply changing one line.
WordNgramfor word Markov models. For that modification you'll create a new class named
WordMarkovModelthat you use in
main. But the word-markov-models depend on the
WordNgramclass you must develop and test.
Randomobject used for random-number generation is constructed thusly:
new Random()will result in a different set of random numbers, and thus different text, being generated each time you run the program. This is more amusing, but harder to debug. If you use a seed of 1234 in your smart/Map model you should get the same random text as when the brute-force method is used. This will help you debug your program because you can check your results with those of the code you're given which hopefully you can rely on as being correct.
You'll need to ensure that
.toString work properly and
efficiently. You'll probably need to implement additional methods to
extract state (words) from a
WordNgram object. In my code,
for example, I had at least two additional methods to get information
about the words that are stored in the private state of a
To facilitate testing your
.hashcode methods a JUnit testing program is
provided. You should use this, and you may want to add more tests
to it in testing your implementation.
Testing with JUnit shows that a method passes some test, but the test
may not be complete. For example, your code will be able to pass
the tests for
.hashCode without ensuring that objects that
are equal yield the same hash-value. That should be the case, but
it's not tested in the JUnit test suite you're given.
To test your
WordNgram class you're given testing code.
This code tests individual methods in your class, these tests are
called unit tests and so
you need to use the standard JUnit
unit-testing library with the
file to test
To choose Run as JUnit test first use the Run As option in the Run menu as shonw on the left below. You have to select the JUnit option as shown on the right below. Most of you will have that as the only option.
There are two tests in
WordNgramTest.java: one for the
.equals and one for the "performance"
If the JUnit tests pass, you'll get all green as shown on the left below. Otherwise you'll get red -- on the right below -- and an indication of the first test to fail. Fix that, go on to more tests. The red was obtained from the code you're given. You'll work to make the testing all green.
WordMarkovModelthat extends the abstract class AbstractModel class. This should be very similar to the
MapMarkovModelclass you wrote, but this class uses words rather than characters.
A sequence of characters was stored as a String in the code for character-oriented Markov models. For this program you'll use ArrayLists (or arrays) of Strings to represent sequences of words used in the model.
The idea is that you'll use 4-words rather than 4-characters in
predicting/generating the next word in an
order-4 word based Markov Model. You'll need to construct the
WordMarkovModel and implement its methods so that
instead of generating 100 characters at random it generates 100 words at
random (but based on the training text it reads).
To get all words from
a String use the String
split method which returns an
array. The regular expression "\\s+" represents any whitespace,
which is exactly what you want to get all the words in file/string.
WordNgramobject, is mapped to a list of
WordNgramobjects --- specifically the n-grams that follow it in the training text. This is exactly what your
MapMarkovModeldid, but it mapped a String to a list of Strings. Each String represented a sequence of k-characters. In this new model, each
WordNgramrepresents a sequence of k-words. The concept is the same.
WordMarkovModelcode you write you'll conceptually replace Strings in the map with WordNgrams. In the code you wrote for maps and strings, calls to
.substringwill be replaced by calls to
new WordNgram. This is because
.substringcreates a new String from parts of another and returns the new String. In the
WordMarkovModelcode you must create a new
WordNgramfrom the array of strings, so that each key in the word-map, created by calling new, corresponds to a string created in your original program created by calling substring.