Implementing Maps
CPS 100, Spring 1996


Program due April 16 (10% bonus), last day accepted, April 19.

Introduction

The files for this assignment can be found in ~ola/cps100/map. Files include

You should copy these files into a subdirectory you create (check to see that permissions are set for instructor/ta access).

You will be re-implementing the Map class in two ways:

  1. Using a hash table and chaining to resolve collisions
  2. Using a binary search tree (this is optional)


Hash Particulars

You will need to create a .cc file for the class HMap (declared in hmap.h), a hash-table implementation of the map class. The constructor for the HMap class has two parameters, a pointer to a hash function (such a function is given in words.cc) and the size of the hash table. It is convenient to use a typedef (or alias) for the pointer to a hash function. Unfortunately, because the function is templated, using a typedef is NOT possible. Instead a full-blow, pointer-to-function declaration is necessary as shown below (from hmap.h)

template <class Key, class Value> class HMap : public Map<Key,Value> { public: HMap(unsigned int (*hash) (const Key &),int size); ~HMap(); ... private: ... unsigned int (*myFunction)(const Key &); };

The parameterhash is a pointer to a function. The function has a Key reference parameter and returns an unsigned int. This function is stored in the private variable myFunction. When constructing an HMap object, the name of a hashing function must be passed to the constructor as shown below (e.g., in words.cc)

HMap<string,int> hmap(Hash,7001);

The HMap constructor will initialize myFunction with the value of the parameter hash, then use myFunction to compute hash values, e.g., as shown below.

Node * index = (*myFunction)(key) % myList.Length();

Be sure that all values returned by this function are mapped to the size of the hash table using the modulus operator % as shown above(this will probably be done in your function Find and perhaps in some other functions as well).

You should specify a value of 7001 for the size of the hash table (this is the smallest prime larger than 7000). A vector of 7001 pointers pointers should be defined, each pointer should each be initialized to 0. You do NOT need to use header nodes when using linked lists for chaining. You can initialize the vector when it is created using the two parameter vector constructor.


Hash table Iterator

You will need to implement a hash-table iterator. This will require both hiterator.h and hiterator.cc. You'll need to maintain state so that client programs can access one element of the hash table at a time using the iterator class. The idea is to keep a pointer to a current chain, and one to a current bucket. Instead of a pointer the current chain it can be convenient to keep an index of the current chain (this is shown in hiterator.h). Advancing the iterator (calling Next()) means advancing the bucket pointer, and if it is NULL/0, advancing the chain pointer until a non-empty chain is found (or the end of the hash-table vector is encountered).

Modifying words.cc

You should modify words.cc and templatewords.cc (and the Makefile) so that a hashtable implementation of a map is tested along with the unsorted and sorted vector implementations. You should change the ProcessWords function so that instead of printing all words with more than 20 occurrences, it should print all words whose number of occurrences is more than 2% of the total number of words. For example, if a file has 2,000 words (including duplicates) than all words that occur at least 40 times should be printed. You do NOT need to use an iterator for this, but you MUST test your iterator in some way to ensure it works. You should also turn in timing results from testing the hash implementation on Shakespeares's Romeo and Juliet and Hamlet, in addition to poe.txt, twain.txt, and hawthorne.txt (all in the data directory). Include these statistics in your README file. For extra credit, you can experiment with different size hash-tables, include these timing results with an explanation in your README file. For even more extra credit, you can modify your hashing implementation so that when a node is found (in a chain) the node is moved to the front of the chain (which increases how fast it will be found the next time). Include statistics for this move-to-front modification.

You may want to time your words.cc program using the online text of the bible which is a very large text of words. Advice: do NOT test this using the vector classes unless you want to wait a while.


OPTIONAL Binary Search Trees

This is optional. Points earned here can be applied to the second test or to any other program

You are also to implement a class BMap that uses binary search trees to implement a map class. This will involve creating both a bmap.h and bmap.cc file. You will also need to create an iterator class that uses an inorder traversal to access elements of the binary search tree class one-at-a-time. There is code in Weiss for this, but it is much more convoluted than it needs to be.

I will post guidelines indicating how to do the iterator much more easily, although you are free to use what's discussed in Weiss. There are some hints on binary search tree iterators

You must also use the binary search tree map class in words.cc and compare its performance to the hash implementation (and optionally to the unsorted/sorted vector implementations).


Grading

The hash implementation is worth 20 points. Five points for the runs and README, five points for hash table iterator, five points for the hash table class (and how well it is implemented, e.g., correctness) and five points for style/robustness of code. The binary search tree is worth 12 points, allocated 3,3,3,3.

What to submit

You should submit a Makefile, a README, and all modified source code include words.cc and the code for hashing.

Submit using

    submit100 map README Makefile words.cc .....