Implementing Maps
CPS 100, Spring 1996
Program due April 16 (10% bonus), last day accepted, April 19.
Introduction
The files for this assignment can be found in
~ola/cps100/map. Files include
You should copy these files into a subdirectory you create (check to see
that permissions are set for instructor/ta access).
You will be re-implementing the Map class in two ways:
- Using a hash table and chaining to resolve collisions
- Using a binary search tree (this is optional)
Hash Particulars
You will need to create a .cc file for the class HMap (declared
in hmap.h), a
hash-table implementation of the map class. The constructor for
the HMap class has two parameters, a pointer to a hash function
(such a function is given in words.cc) and the size of the hash table.
It is convenient to use a typedef (or alias) for the pointer to a hash
function. Unfortunately, because the function is templated, using a
typedef is NOT possible. Instead a full-blow, pointer-to-function
declaration is necessary as shown below (from hmap.h)
template
class HMap : public Map
{
public:
HMap(unsigned int (*hash) (const Key &),int size);
~HMap();
...
private:
...
unsigned int (*myFunction)(const Key &);
};
The parameterhash is a pointer to a function. The function
has a Key reference parameter and returns an unsigned int.
This function is stored in the private variable myFunction.
When constructing an HMap object, the name of a hashing function must
be passed to the constructor as shown below
(e.g., in words.cc)
HMap hmap(Hash,7001);
The HMap constructor will initialize myFunction with the value
of the parameter hash, then use myFunction to compute
hash values, e.g., as shown below.
Node * index = (*myFunction)(key) % myList.Length();
Be sure that all values returned by this function are mapped to the size
of the hash table using the modulus operator % as shown above(this will
probably be done in your function Find and perhaps in some
other functions as well).
You should specify a value of 7001 for the size of the hash table (this
is the smallest prime larger than 7000). A vector of 7001 pointers
pointers should be defined, each pointer should each be initialized to
0. You do NOT need to use header nodes when using linked lists for
chaining. You can initialize the vector when it is created using the
two parameter vector constructor.
Hash table Iterator
You will need to implement a hash-table iterator. This will require
both hiterator.h and hiterator.cc.
You'll need to maintain
state so that client programs can access one element of the hash table
at a time using the iterator class. The idea is to keep a pointer to a
current chain, and one to a current bucket. Instead of a pointer the
current chain it can be convenient to keep an index of the current
chain (this is shown in hiterator.h). Advancing the iterator
(calling Next()) means advancing the bucket pointer, and if it
is NULL/0, advancing the chain pointer until a non-empty chain is found
(or the end of the hash-table vector is encountered).
Modifying words.cc
You should modify words.cc and templatewords.cc (and
the Makefile) so that a hashtable implementation of a map is tested
along with the unsorted and sorted vector implementations. You should
change the ProcessWords function so that instead of printing
all words with more than 20 occurrences, it should print all words whose
number of occurrences is more than 2% of the total number of words. For
example, if a file has 2,000 words (including duplicates) than all words
that occur at least 40 times should be printed. You do NOT need to use
an iterator for this, but you MUST test your iterator in some way to
ensure it works. You should also turn in timing results from testing
the hash implementation on Shakespeares's Romeo and Juliet and Hamlet,
in addition to poe.txt, twain.txt, and hawthorne.txt (all in the data
directory). Include these statistics in your README file. For extra
credit, you can experiment with different size hash-tables, include
these timing results with an explanation in your README file. For even
more extra credit, you can modify your hashing implementation so that
when a node is found (in a chain) the node is moved to the front of the
chain (which increases how fast it will be found the next time).
Include statistics for this move-to-front modification.
You may want to time your words.cc program using the online text of
the bible which is a very large text of words. Advice: do NOT test this
using the vector classes unless you want to wait a while.
OPTIONAL Binary Search Trees
This is optional. Points earned here can be applied
to the second test or to any other program
You are also to implement a class BMap that uses binary search
trees to implement a map class. This will involve creating both a
bmap.h and bmap.cc file. You will also need
to create an iterator class that uses an inorder traversal to access
elements of the binary search tree class one-at-a-time. There is code
in Weiss for this, but it is much more convoluted than it needs to be.
I will post guidelines indicating how to do the iterator much more
easily, although you are free to use what's discussed in Weiss.
There are some
hints on binary search tree iterators
You must also use the binary search tree map class in words.cc
and compare its performance to the hash implementation (and optionally
to the unsorted/sorted vector implementations).
Grading
The hash implementation is worth 20 points. Five points for the runs
and README, five points for hash table iterator, five points for the
hash table class (and how well it is implemented, e.g., correctness) and
five points for style/robustness of code. The binary search tree is
worth 12 points, allocated 3,3,3,3.
What to submit
You should submit a Makefile, a README, and all modified source code
include words.cc and the code for hashing.
Submit using
submit100 map README Makefile words.cc .....