CPS 100, Test 2, Practicum
Due date: April 14: 8:00 am, absolutely no extensions
This part of the test should require an hour or two of work.
You are to work completely by yourself. All
communication with TAs/UTAs/Professors/Colleagues/Unknowns should be
done via the newsgroup duke.cs.cps100.
First you should copy all the files from ~ola/cps100/maps, these
files are also accessible here.
Copy these files into a directory you create (call the directory
maps). Then compile by typing make usewords. You
should create a link to the literary textfiles by typing
ln -s ~ola/data data
Then you can run the program, and when prompted for an input file type:
data/poe.txt or data/hamlet.txt
Run usewords (4 points)
You should create a README file in which you answer the questions below
based on running usewords. In your README file you should also
include the number of hours you worked on this assignment.
- Using romeo.txt, hamlet.txt, and
tempest.txt, what two words occur more than 2% of the
time in all three files. List how many times each of the two
words occurs in each of the three files.
- Which of the three plays has the most unique words (and how
many words is it)?
- How many seconds does it take to process
hawthorne.txt and what is the average word length?
Keep hash statistics (6 points) OPTIONAL! Extra Credit
The hash table used in hmap.cc and declared in hmap.h
is a vector of linked lists. Each element of the linked list
is declared to be Node<Pair<Key,Value> > which in
usewords.cc makes it Node<Pair<string,int> > since
each string is mapped to the number of times the string occurs in a
file. The templated type Pair is declared in map.h, it
has two fields: first (the key) and second (the value).
Add a new member function to the class HMap whose prototype is
shown below (add this to hmap.h).
void HashStats() const;
In the file hmap.cc you implement this function using the header:
template
void HMap::HashStats() const
{
// add code here
}
This function should compute and print the length of the longest chain
and the average chain length. In calculating the average chain length
count only chains that have at least one node, i.e., if there 7,001 hash
"buckets", but only 30 have linked-lists/chains, divide by 30 when
calculating the average chain length.
Use the code below as a starting point for calculating statistics.
int numChains = 0;
int k;
for(k=0; k < myList.Length(); k++)
{
Node >* ptr = myList[k];
if (ptr != 0)
{
numChains++;
}
}
When your function works, run the program on hawthorne.txt
and melville.txt and include statistics for these programs in
your README file. For extra credit, run the program with both a smaller and
larger number of buckets: currently there are 7,001 buckets. Use both
3,001 and 13,001 (which are both prime numbers). Include statistics in
your README file for these numbers (4 points extra credit).
Sorting Words (extra credit) (6 points)
There's a line in usewords.cc that is commented out, that can
be used to print all the entries in the hash table:
// uncomment line below to print all entries in table
// map.Apply(Print);
However, since the hash table isn't sorted, this will print words in the
order they occur in the hash table. Write a new class that
inherits from MapBase, that you'll use to print the words in the hash
table sorted alphabetically.
To do this, you'll implement the class MapSort
declared below which uses
WordInfo also shown:
struct WordInfo
{
string word;
int count; // how many times word occurs
};
class MapSort : public MapBase
{
public:
MapSort();
virtual void Function(string & key, int & value);
virtual void Report();
private:
int myCount; // # different words
Vector myList;
};
You'll need to implement the constructor for MapSort, this
should construct the vector myList and initialize
myCount. Every time that MapSort::Function is called,
you'll need to add another entry to myList, growing it as
necessary. When MapSort::Report is called you'll sort
myList, then print it. You should sort it alphabetically,
but print both the word and the # times the word occurs.
Submit
To submit use:
submit100 test2 README usewords.cc hmap.h hmap.cc Makefile
Your usewords.cc program should show hash statistics when
compiled and run.
Owen L. Astrachan
Last modified: Sat Apr 12 14:43:21 EDT 1997