Compsci 100, Fall 2009, August 31, Frequencies


Name: _____________________  Netid: _________________
 
Name: _____________________  Netid: _________________

Name: _____________________  Netid: _________________
  1. You're given a printout for Frequencies.java and some questions below about the methods doFreqsA and doFreqsB from that code.

    When run using Melville's Bartleby The Scrivener with 4,256 unique words. The output is as shown below (it's not side by side, but vertical when actually run-- below) doFreqsA on the left and doFreqsB on the right.

    total # words read: 14353
    1	&c.,                                  1	&c.,				     
    1	'Tis				      1	'Tis				     
    1	'em-can't			      1	'em-can't			     
    1	(				      1	(				     
    1	(Nippers's)			      1	(Nippers's)			     
    1	(The				      1	(The				     
    1	(and				      1	(and				     
    1	(as				      1	(as				     
    1	(except				      1	(except				     
    2	(for				      2	(for				     
    1	(he				      1	(he				     
    1	(one				      1	(one				     
    2	(the				      2	(the				     
    1	6				      1	6				     
    6	A				      6	A				     
    1	According			      1	According			     
    1	Accordingly			      1	Accordingly			     
    1	Accordingly,			      1	Accordingly,			     
    1	Acting				      1	Acting				     
    1	Adam				      1	Adam				     
    time to complete: 1.859000		      time to completed 0.020000 
    
    
    Here's the output from Hamlet with 7,806 unique words
    total # words read: 31955
    1	&c.'                                  1	&c.'		  
    1	''Tis				      1	''Tis		  
    5	'A				      5	'A		  
    1	'A-down				      1	'A-down		  
    1	'Adieu,				      1	'Adieu,		  
    1	'And				      1	'And		  
    1	'Anon				      1	'Anon		  
    1	'As				      1	'As		  
    1	'Before				      1	'Before		  
    1	'But				      1	'But		  
    1	'By-and-by'			      1	'By-and-by'	  
    1	'Choose				      1	'Choose		  
    1	'Doubt				      1	'Doubt		  
    1	'For				      1	'For		  
    1	'Forgive			      1	'Forgive	  
    1	'Gainst				      1	'Gainst		  
    2	'Good				      2	'Good		  
    1	'HAMLET.'			      1	'HAMLET.'	  
    1	'He				      1	'He		  
    1	'Horatio,			      1	'Horatio,	  
    time to complete: 7.534000		      time to completed 0.037000
    

    Here's the output from A Scarlet Letter with 14,124 unique words:

    
    total # words read: 85754
    1	                                      1			  
    98	"				      98  "		  
    9	"A				      9	  "A		  
    1	"Adorn				      1	  "Adorn		  
    1	"Advise				      1	  "Advise		  
    1	"Ah!				      1	  "Ah!		  
    2	"Ah,				      2	  "Ah,		  
    1	"Ah,"				      1	  "Ah,"		  
    1	"Aha!				      1	  "Aha!		  
    1	"Alas!				      1	  "Alas!		  
    1	"All				      1	  "All		  
    1	"Alone,				      1	  "Alone,		  
    1	"Am				      1	  "Am		  
    1	"An				      1	  "An		  
    21	"And				      21  "And		  
    2	"And,				      2	  "And,		  
    1	"Annals")			      1	  "Annals")	  
    2	"Art				      2	  "Art		  
    2	"Arthur				      2	  "Arthur		  
    1	"As				      1	  "As		  
    time to complete: 36.026000		      time to completed 0.097000
    
    

    1. Write a two-to-three sentence description of how the method doFreqsA works, e.g., how it results in a list of each different word read and the number of times each word occurs.
      
      
      
      
      
      
      
    2. What is the purpose of the line Collections.frequency(list,s) in the code?
      
      
      
      
      
      
      
    3. What is the purpose of the break statement in the loop that prints words and frequencies?
      
      
      
      
      
      
      
    4. The running times of doFreqsA changes depending on both N the total number of words read, and U the number of unique/different words read. Suppose three files contain the same value for U, e.g., each contains 5,000 different values; but each file has a different number of total words N -- suppose this number is 20,000; 30,000; and 40,000 for the three files.

      If the running time for doFreqsA on the 20,000 word file is 12 seconds what do you expect the running time to be for the other two files and why? Explain your reasoning.

      
      
      
      
      
      
      
      
      
      
      
      
      
      
    5. Come up with an expression for the running time of doFreqsA in terms of both N and U and a justification for your expression.
      
      
      
      
      
      
      
      
      
      
    6. The idea in doFreqsB is to use a map which is a collection of keys and values. In this case the key is a string or word and the value associated with each key is the number of times the key occurs. You can put values into a map and get values out of a map, typically you do this by reference to a key.

      What is the purpose of the code guarded by if ! map.containsKey(s))... as shown? Explain in words.

      
      
      
      
      
      
      
      
      
    7. Why does the output for both doFreqsA and doFreqsB display words in the same order?
      
      
      
      
      
      
      
      
      
  2. We will discuss and explain why the running time for accessing elements in a TreeSet or TreeMap is proportional to the (base 2) logarithm of the number of elements in the map. Explain why the running time of doFreqsB is so much faster than the running time of the other method. Make reference to N and U from above if possible.