Rewrite the program to compute the number of times each different word occurs in the input file.  Print a list of the 10 most-frequent words in the input, one word per line.  Each line should contain the word, a tab, followed by the number of times it occurs.  Submit the program as “P9”.  A “word” is a sequence of upper and lower-case letters, separated by whitespace, or non-letters.  Treat the characters “_” and “’” as if they were letters for this program.

 

This version of the program should use a hash table lookup which uses linked lists to keep track of all the words which hash to the same location in the hash table.  I’d like you to write the list manipulation routines as a separate file.  The list elements should be allocated from a shared global array of elements (say 10,000 elements).  These elements should be structs, defined to include indices to (a) other list elements, and (b) locations in a shared global “string table” where the C string for each distinct word is stored (say 1000000 characters for all distinct words, plus separating \0’s).  (In other words, don’t use pointers for this.)  Your structs can include other information, like the occurrence count, also.

 

The list routines I propose are very low-level:

void push(int i, int j)

Push a list element which holds value i, and count 0 onto the list indicated at hashtable[j]

int next(int i)

Return the index of the next element on the list following element i, or -1 if no element follows i

int val(int i)

Return the value held in list element i.

int gcount(int i)

Return the count held in list element i

void scount(int i, int j)

Set the count in element i to j

 

The values held in each list element are intended to be indices into the string array, so that each list element represents a word, by pointing (via an index) to the beginning of the string representing that word.  The list element should also contain a count of the number of times this word has been seen.  The array of “10 words with largest counts” can hold list-element indices.