Compsci 100, Fall 2009, September 11

You can work with one or two people in answering recitation questions.

Name: _____________________  Netid: _________________


Name: _____________________  Netid: _________________
 

Name: _____________________  Netid: _________________


  1. In implementing the new, smarter method for creating random text in the Markov assignment you're given the main method below to launch/run the program: public static void main(String[] args){ IModel model = new MarkovModel(); SimpleViewer view = new SimpleViewer("Compsci 100 Markov Generation", "k count>"); view.setModel(model); }

    According to the instructions for the assignment you will implement a new class MapMarkovModel that is a subclass of the class AbstractModel. The new subclass will generate random text more efficiently -- but it doesn't change that random text is generated, just how the text is generated. To use this new class you'll change the main method to what's shown below (there's only one change!)

    public static void main(String[] args){ IModel model = new MapMarkovModel(); SimpleViewer view = new SimpleViewer("Compsci 100 Markov Generation", "k count>"); view.setModel(model); }

    In a few words, why is this change in main enough -- that is why will the view call the methods in the new class you've created, MapMarkovModel and why don't you need to modify the view so this happens?

    
    
    
    
    
    
    
    
    
    
    
    (Questions follow about the code in MarkovModel class)

  2. The code from method brute in MarkovModel is reproduced below. This code constructs random text. In the code in the class and below, the instance variable myString is the entire file read by the initialize method and stored as a String value:

    /** * Generate numLetters random characters using an order-k Markov model. * @param k is the order of the markov model * @param numLetters is the number of characters to generate */ public void brute(int k, int numLetters) { int start = myRandom.nextInt(myString.length() - k + 1); String str = myString.substring(start, start + k); String wrapAroundString = myString + myString.substring(0,k); ArrayList<Character> list = new ArrayList<Character>(); for (int i = 0; i < numLetters; i++) { list.clear(); int pos = 0; while ((pos = wrapAroundString.indexOf(str, pos)) != -1 && pos < myString.length()) { char ch = wrapAroundString.charAt(pos + k); list.add(ch); pos++; } // more code here
      First, a few questions about String methods:
    1. What is the purpose of the second parameter pos (whose initial value is zero) in the call to String.indexOf in the code above and why is pos incremented as the last statement of the loop? Two Answers.
      
      
      
      
      
      
      
      
    2. In a few words, what is the reason that wrapAroundString is used to find matches for the seed generating random text (the seed is variable str above) rather than using myString to find matches?
      
      
      
      
      
      
      
      
      
      
      
  3. In the new, map-based model code you're writing in the class MapMarkovModel you'll create a map similar to the following: Map<String,ArrayList<String>> myMap; The keys in the map are each n-gram that occurs in the file. The value associated with each n-gram is a list of the "following" n-grams, see the writeup for details.

    You might write code similar to the following to store values in the map for an order-k Markov model.

    for (int j=0; j < myString.length(); j++){ String ngram = myWrapAroundString.substring(j,j+k) if (!myMap.containsKey(ngram)){ myMap.put(ngram,new ArrayList<String>()); } ArrayList<String> list = myMap.get(ngram); list.add(VALUE_NEEDED_HERE); }
    1. What is the purpose of the call to containsKey and the put statement it guards?
      
      
      
      
      
      
    2. The variable ngram takes on the value of every k-length substring in the file for which a Markov model is generated. Why is wrapAroundString used instead of myString?
      
      
      
      
      
    3. For the string
         "anthem theatre tithe"
      
      
      the 3-gram key "the" should be mapped to the three substrings
       "hem"
       "hea"
       "hea"
      
      Explain why the string "hea" is listed twice.
      
      
      
      
      
      
    4. In the map code above, each time the loop iterates a key will be associated with a list of following n-grams that is the key's value. Each element of the list is an n-gram. Once the key is found, the VALUE_NEEDED_HERE can be replaced by using one call of substring what is that code?
      
      
      
      
      


    WordNgram

    There's a second part of the assignment: creating the WordNgram class. Some questions about that task to help understand that part of the project.

  4. For the Markov assignment you'll need to implement a class WordNgram where each object that is an instance of the class stores N words (or k words for an order-k Markov Model). You're given part of the class definition and must fill in several methods. One version of equals is shown below --- it works correctly!. Each WordNGram object will store the same number of words in your program -- that number of words is the k in an order-k Word Markov Model.

    public class WordNgram { private String[] myWords; public boolean equals(Object o){ WordNgram other = (WordNgram) o; for(int k=0; k < myWords.length; k++) { if (!myList[k].equals(other.myList[k])) return false; } return true; } }
    1. Why is the parameter of type Object and why is it cast to a WordNGram?
      
      
      
      
      
      
      
    2. Explain in words when the equals method returns false.
      
      
      
      
      
      
      
      
    3. When does the method return true?
      
      
      
      
      
      
      
      
      
      
      
      
      
    4. Why is it possible for other.myList to be accessed when myList is private?
      
      
      
      
      
      
      
  5. The current version of hashCode works correctly, but it results in a poorly performing hashmap. In a few words, why would the performance be poor?
    
    
    
    
    
    
    
    
    
    Complete a version of hashCode that simply adds the individual values of the Strings in myList and returns the result: public int hashCode() { int sum = 0; for(int k=0; k < myList.length; k++) { sum += myList[k].hashCode(); } return sum; } Explain why this is better than returning 15, but why this method might still have flaws?
    
    
    
    
    
    
    

  6. Explain why it might be better to use a line like this in hashCode as the body of the loop rather than what's shown above. In answering the question consider hashCode values for the 4-gram "jump big dog jump" and "jump jump big dog" sum = 100*sum + myList[k].hashCode();
  7. When using characters in the original Markov program (brute or map version) the String.substring method is used to extract characters from a string and the String.charAt method is used to extract an individual character. For the WordNGram class a wordAt method might be useful. What would it return -- it's analagous to charAt?
    
    
    
    
    
    
    
  8. In a few words, describe what the compareTo method should return and how that can be accomplished. You shouldn't write code, but a few word description.