Compsci 100e, Fall 2009, Map Questions

name: ___________________________________

name: ___________________________________

name: ___________________________________


Part A: Isomorphic Words

In the Isomorphic Words APT you must return the number of pairs of words that are isomorphic.

  1. How many pairs are there if there are N array elements/words? Try for approximate and exact answers.
    
    
    
    
    
  2. If the APT asked for the number of words start with the letter 'A', could you solve the problem? How?
    
    
    
    
    
  3. If the APT asked for the number of pairs of words start with the letter 'A', could you solve the problem? How?
    
    
    
    
    
    
  4. If the APT asked for the number of pairs of words that start with the same letter, could you solve the problem? How?
    
    
    
    
    

  5. Two ideas to determine if two words A and B are isomorphic:

    1. For word A, process each letter in turn and map it to the corresponding letter in the word B. If you process the same letter a second (or third or ...) time, check what it maps to in the map and what it corresponds to in word B, these should be the same.
      
      
      
      
      
      
    2. Create a canonical form so that ababd maps to ABABC and exexa maps to ABABC as well. These words are isomorphic and have the same canonical form. How is the form determined?
      
      
      
      

  6. Complete the Isomorphic Words APT

Part B: Maps

Snarf classwork/05_Maps or view the browsable code.
  1. IMapper provides a basic interface to a Map. We have given you an implementation UtilMapHash based on the java.util.HashMap. Create a class ArrayListHash where there is an ArrayList of buckets and each bucket is an ArrayList<Combo> where Combo will be an inner class that stores and integer value and a string key. How does the performance of ArrayListHash compare to UtilMapHash? Test using HashMain

  2. A key word in context or KWIC index provides a list of all the words in a document in the context in which the words occur.

    For example, some occurrences of love and hate from Shakespeare's Romeo and Juliet are shown below, the formatting isn't pretty but you can see several occurrences of each of the [love,hate] in the context in which the word occurs in the play.

    talk of peace? I hate the word As I 
    the word As I hate hell, all Montagues, and 
    better ended by their hate Than death prorogued, wanting 
    lives, By doing damned hate upon thyself? Why railest 
    But thankful even for hate that is meant love. 
    

    not. She whom I love now Doth grace for 
    grace for grace and love for love allow. The 
    grace and love for love allow. The other did 
    she knew well Thy love did read by rote, 
    

    You'll be looking at code in a new class ContextModel that finds key words in context as follows.

    Every word from a file is stored in an array in the order in which the word appears. Because of the miracle of the String.split method it's simple to get this array of every word.

    We'll call the array myWords, it's state in the ContextModel class.

    Keep track of every unique word (uword) and where uword occurs in myWords by storing the index of each occurrence of uword in a list of indexes associated with the uword. This information is stored in a map in which each uword is a key and a list of integer indices are a key's associated value: the locations of the uword in the array myWords

    Given a word, find all its indices, then use those indices to get the contexts, show these contexts.

    How the View communicate with the Model

    When the user loads a file, all the words are stored. The model's initialize method takes care of this.

    When the user enters a word and presses go, the word is sent to the model's process method as a String. In the code you're given, the map of key words to their indexes is constructed as needed. Then the map is used to generate contexts. Currently the map-filling code is missing (in what you get before class).

    You'll complete method fillMap and the code to build the context strings.

    1. The user enters a word in the GUI and the word is passed to the model's process method. Why is the map's size used to determine if the map should be filled? When does the map need to be filled and when can the "old" map be used?
      
      
      
      
      
      
      
    2. If the code shown in process here is removed what will happen when the user types a word that doesn't appear in the file (or has too few letters)? if (!myMap.containsKey(word)) { this.showViewsError("word " + word + " not found"); return; }
    3. What is the purpose of the continue statement in building the context string? When will it be executed? for (int i = index - CONTEXT_SIZE; i <= index + CONTEXT_SIZE; i++) { if (i < 0 || i >= myWords.length) continue; build.append(myWords[i] + " "); }

    4. Finish the code in fillMap (shown below as well) private void fillMap() { for (int k = 0; k < myWords.length; k++) { if (myWords[k].length() >= MIN_LENGTH) { List<Integer> all = myMap.get(myWords[k]); // add code here } } }
    5. How would you make the keyword appear in uppercase in the GUI by modifying code in the model?
      
      
      
      
      
      
      
      
    6. If you want to print a list of every keyword in context, you could modify the code in the model to iterate over the keys in the map-- one new line is added below to the code in ContextModel.process. What line is added and why would using a TreeMap instead of a HashMap be better (changing the constructor of ContextModel)? ArrayList<String> display = new ArrayList<String>(); StringBuilder build = new StringBuilder(); for(String word : myMap.keySet()) { for (int index : myMap.get(word)) { build.delete(0, build.length()); for (int i = index - CONTEXT_SIZE; i <= index + CONTEXT_SIZE; i++) { if (i < 0 || i >= myWords.length) continue; build.append(myWords[i] + " "); } display.add(build.toString()); } } this.notifyViews(display); this.messageViews("# contexts: " + display.size());

Part C: Frequencies

Review Frequencies.java and answer the questions below about the methods doFreqsA and doFreqsB from that code.

When run using Melville's Bartleby The Scrivener with 4,256 unique words. The output is as shown below (it's not side by side, but vertical when actually run-- below) doFreqsA on the left and doFreqsB on the right.

total # words read: 14353
1	&c.,                                  1	&c.,				     
1	'Tis				      1	'Tis				     
1	'em-can't			      1	'em-can't			     
1	(				      1	(				     
1	(Nippers's)			      1	(Nippers's)			     
1	(The				      1	(The				     
1	(and				      1	(and				     
1	(as				      1	(as				     
1	(except				      1	(except				     
2	(for				      2	(for				     
1	(he				      1	(he				     
1	(one				      1	(one				     
2	(the				      2	(the				     
1	6				      1	6				     
6	A				      6	A				     
1	According			      1	According			     
1	Accordingly			      1	Accordingly			     
1	Accordingly,			      1	Accordingly,			     
1	Acting				      1	Acting				     
1	Adam				      1	Adam				     
time to complete: 1.859000		      time to completed 0.020000 

Here's the output from Hamlet with 7,806 unique words
total # words read: 31955
1	&c.'                                  1	&c.'		  
1	''Tis				      1	''Tis		  
5	'A				      5	'A		  
1	'A-down				      1	'A-down		  
1	'Adieu,				      1	'Adieu,		  
1	'And				      1	'And		  
1	'Anon				      1	'Anon		  
1	'As				      1	'As		  
1	'Before				      1	'Before		  
1	'But				      1	'But		  
1	'By-and-by'			      1	'By-and-by'	  
1	'Choose				      1	'Choose		  
1	'Doubt				      1	'Doubt		  
1	'For				      1	'For		  
1	'Forgive			      1	'Forgive	  
1	'Gainst				      1	'Gainst		  
2	'Good				      2	'Good		  
1	'HAMLET.'			      1	'HAMLET.'	  
1	'He				      1	'He		  
1	'Horatio,			      1	'Horatio,	  
time to complete: 7.534000		      time to completed 0.037000

Here's the output from A Scarlet Letter with 14,124 unique words:


total # words read: 85754
1	                                      1			  
98	"				      98  "		  
9	"A				      9	  "A		  
1	"Adorn				      1	  "Adorn		  
1	"Advise				      1	  "Advise		  
1	"Ah!				      1	  "Ah!		  
2	"Ah,				      2	  "Ah,		  
1	"Ah,"				      1	  "Ah,"		  
1	"Aha!				      1	  "Aha!		  
1	"Alas!				      1	  "Alas!		  
1	"All				      1	  "All		  
1	"Alone,				      1	  "Alone,		  
1	"Am				      1	  "Am		  
1	"An				      1	  "An		  
21	"And				      21  "And		  
2	"And,				      2	  "And,		  
1	"Annals")			      1	  "Annals")	  
2	"Art				      2	  "Art		  
2	"Arthur				      2	  "Arthur		  
1	"As				      1	  "As		  
time to complete: 36.026000		      time to completed 0.097000

  1. Write a two-to-three sentence description of how the method doFreqsA works, e.g., how it results in a list of each different word read and the number of times each word occurs.
    
    
    
    
    
    
    
  2. What is the purpose of the line Collections.frequency(list,s) in the code?
    
    
    
    
    
    
    
  3. What is the purpose of the break statement in the loop that prints words and frequencies?
    
    
    
    
    
    
    
  4. The running times of doFreqsA changes depending on both N the total number of words read, and U the number of unique/different words read. Suppose three files contain the same value for U, e.g., each contains 5,000 different values; but each file has a different number of total words N -- suppose this number is 20,000; 30,000; and 40,000 for the three files.

    If the running time for doFreqsA on the 20,000 word file is 12 seconds what do you expect the running time to be for the other two files and why? Explain your reasoning.

    
    
    
    
    
    
    
    
    
    
    
    
    
    
  5. Come up with an expression for the running time of doFreqsA in terms of both N and U and a justification for your expression.
    
    
    
    
    
    
    
    
    
    
  6. The idea in doFreqsB is to use a map which is a collection of keys and values. In this case the key is a string or word and the value associated with each key is the number of times the key occurs. You can put values into a map and get values out of a map, typically you do this by reference to a key.

    What is the purpose of the code guarded by if ! map.containsKey(s))... as shown? Explain in words.

    
    
    
    
    
    
    
    
    
  7. Why does the output for both doFreqsA and doFreqsB display words in the same order?
    
    
    
    
    
    
    
    
    
  8. The running time for accessing elements in a TreeSet or TreeMap is proportional to the (base 2) logarithm of the number of elements in the set or map. Explain why the running time of doFreqsB is so much faster than the running time of the other method. Make reference to N and U from above if possible.
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
  9. Another option is to store the frequencies with the words in a WordCount object as in an earlier classwork. Consider the FrequenciesSorted class that uses a WordCount object to store a word and its frequency.
    1. Add code to create an ArrayList of WordCount objects from the map.
    2. Sort the list of WordCount objects as in doFreqsA by frequency of occurrence. Why do you need to use Collection.sort instead of Arrays.sort?
    3. Give the running time of the code you wrote in the previous two parts in terms of both N and U and a justification for your expression.