|
|
|
Week 8, 10/16
Prelab Exercises
Complete the Map In-Class Questions. Really.
Lab Exercises
Markov Review
-
In implementing the new, smarter method for creating random text in the
Markov assignment you're given the
main method below to launch/run the program:
public static void main(String[] args){
IModel model = new MarkovModel();
SimpleViewer view = new SimpleViewer("Compsci 100e Markov Generation", "k count>");
view.setModel(model);
}
According to the instructions for the
assignment you will implement a new class
MapMarkovModel that is a subclass of
the class
AbstractModel.
The new subclass will generate random
text more efficiently -- but it doesn't change that random
text is generated, just how the text is generated.
To use this new class you'll change
the main method to what's shown below (there's only one change!)
public static void main(String[] args){
IModel model = new MapMarkovModel();
SimpleViewer view = new SimpleViewer("Compsci 100e Markov Generation", "k count>");
view.setModel(model);
}
In a few words, why is this change in main enough -- that
is why will the view call the methods in the new class you've
created,
MapMarkovModel and why don't you need to modify the view so
this happens?
(Questions follow about the code in MarkovModel class)
- The code from method
brute
in MarkovModel is reproduced below.
This code constructs random
text. In the code in the class and below,
the instance variable myString is the entire file
read by the initialize method and stored as a
String value:
/**
* Generate N letters using an order-K markov model.
* @param k is
the order of the markov model
* @param numLetters is the number of characters to generate
*/
public void brute(int k, int numLetters) {
int start = myRandom.nextInt(myString.length() - k + 1);
String str = myString.substring(start, start + k);
String wrapAroundString = myString + myString.substring(0,k);
StringBuilder build = new StringBuilder();
ArrayList<Character> list = new ArrayList<Character>();
double stime = System.currentTimeMillis();
for (int i = 0; i < numLetters; i++) {
list.clear();
int pos = 0;
while ((pos = wrapAroundString.indexOf(str, pos)) != -1 && pos < myString.length()) {
char ch = wrapAroundString.charAt(pos + k);
list.add(ch);
pos++;
}
Code continues.
First, a few questions about String methods:
- What is the purpose of the second parameter
pos (whose
initial value is zero) in the call to
String.indexOf in the code above and why is
pos incremented as the last statement of the loop? Two Answers.
- In a few words, what is the reason that
wrapAroundString is used to find matches for the
seed generating random text (the seed is variable str
above) rather than using myString to find
matches?
- A student suggests making
wrapAroundString an
instance variable rather than a local variable in the method
brute. If the instance variable
is myWrapAroundString and
is initialized to null when a file is read in the
initialize method then
the code below is used in method brute for
creating the value stored as myWrapAroundString.
We replace this line:
String wrapAroundString = myString + myString.substring(0,k);
with this line:
if (myWrapAroundString != null ||
myWrapAroundString.length() != myString.length() + k) {
myWrapAroundString = myString + myString.substring(0,k);
}
Explain why this is a reasonably good idea and what the purpose of the
if statement is in guarding the intialization of instance
variable myWrapAroundString. Also, explain an unanticipated
situation in which this code could fail to do the right thing if
myWrapAroundString wasn't initialized to
null when a file is read.
MapMarkovModel
- In the new, map-based model code you're writing
in the class
MapMarkovModel you'll create a map similar to the
following:
Map> myMap;
The keys in the map are each n-gram that occurs in the file.
The value associated with each n-gram is a list of the "following"
n-grams, see the writeup for details.
You might
write code similar to the following to store values in the map
for an order-k Markov model.
for (int j=0; j < myString.length(); j++){
String ngram = myWrapAroundString.substring(j,j+k)
if (!myMap.containsKey(ngram)){
myMap.put(ngram,new ArrayList());
}
ArrayList list = myMap.get(ngram);
list.add(VALUE_NEEDED_HERE);
}
- What is the purpose of the call to
containsKey and
the put statement it guards?
- The variable
ngram takes on the value of every
k-length substring in the file for which a Markov model is generated.
Why is wrapAroundString used instead of
myString?
- For the string
"anthem theatre tithe"
the 3-gram key "the" should be mapped to the three substrings
"hem"
"hea"
"hea"
Explain why the string "hea" is listed twice.
- In the map code above, each time the loop iterates a key
will be associated with a list of following n-grams that is the
key's value. Each element of the list is an n-gram. Once the key is
found, the VALUE_NEEDED_HERE can be
replaced by using one call of
substring what is that code?
WordNgram
There's a second part of the assignment: creating the
WordNgram
class. Some questions about that task to help understand that part of
the project.
- For the Markov assignment you'll need to implement a class
WordNgram where each object that is an instance of the
class stores N words (or k words for an order-k Markov Model). You're
given part of the class definition and must fill in several methods. One
version of equals is shown below --- it
works correctly!. Each
WordNGram object will store the same number of words in
your program -- that number of words is the k in an order-k Word Markov Model.
public class WordNgram {
private String[] myWords;
public boolean equals(Object o){
WordNgram other = (WordNgram) o;
for(int k=0; k < myWords.length; k++) {
if (!myList[k].equals(other.myList[k])) return false;
}
return true;
}
}
- Why is the parameter of type
Object and why is it cast
to a WordNGram?
- Explain in words when the
equals method returns false.
- When does the method return true?
- Why is it possible for
other.myList to be accessed
when
myList is private?
-
The current version of
hashCode works correctly, but it results
in a poorly performing hashmap. In a few words, why would the
performance
be poor?
Complete a version of hashCode that
simply adds the individual values of the Strings in myList
and
returns the result:
public int hashCode() {
int sum = 0;
for(int k=0; k < myList.length; k++) {
sum += myList[k].hashCode();
}
return sum;
}
Explain why this is better than returning 15, but why this method
might still have flaws?
- Explain why it might be better to use a line like this in
hashCode as the body of the loop rather than what's shown
above. In answering the question consider hashCode
values
for the 4-gram "jump big dog jump" and "jump jump big dog"
sum = 100*sum + myList[k].hashCode();
- When using characters in the original Markov program (brute or map
version) the
String.substring method is used to
extract characters from a string and the
String.charAt method is used to extract an individual
character. For the WordNGram class a
wordAt method might be useful. What would it
return -- it's analagous to charAt?
- In a few words, describe what the
compareTo method
should return and how that can be accomplished.
|