CPS 100, Fall 2006, KWIC


See the FAQ/help


Worth 28 points

A Key Word in Context index is useful in looking up titles, words, and other things. Words that aren't key words are ignored in generating a KWIC index. For example, if words to ignore are `` the, of, and, as, a '' and a list of titles is:

Descent of Man
The Ascent of Man
The Old Man and The Sea
A Portrait of The Artist As a Young Man

A KWIC-index of these titles might be given by:

                      a portrait of the ARTIST as a young man 
                                    the ASCENT of man 
                                        DESCENT of man 
                             descent of MAN 
                          the ascent of MAN 
                                the old MAN and the sea 
    a portrait of the artist as a young MAN 
                                    the OLD man and the sea 
                                      a PORTRAIT of the artist as a young man 
                    the old man and the SEA 
          a portrait of the artist as a YOUNG man 

A concordance is similar to a KWIC index. For example, two lines from the online copy of the bible are reproduced below. Assume that these are lines 10 and 11. Other lines (12, 13, etc.) are not shown in detail.

And the earth was without form, and void; and darkness was
upon the face of the deep. 
XXX YYYY ZZZZ
These lines might generate a concordance as follows:
      void and DARKNESS was upon the       10-11
        of the DEEP XXX YYY ZZZ            11-12
       and the EARTH was without form      10
      upon the FACE of the deep            11
   was without FORM and void and           10
  darkness was UPON the face of            10-11
      form and VOID and darkness was       10
     earth was WITHOUT form and void       10
In this concordance, words of fewer than three letters are not considered as Key Words, and aren't listed in the concordance. Write a program that generates a concordance from a textfile. Storage efficiency is an important consideration in designing your program, but the correctness and design of the program are more important. Storage efficiency is much more important than speed (though programs that are unbelievably slow may not get graded.)

The output should be sorted by keyword, with keywords capitalized and other words in lower case. Punctuation not internal to words should be ignored.

The context for each key word is two words before the key word and three words after the key word. Ideally these values will be configurable/changeable in your program.

Your program should include the option of allowing a file of words to be read, these words will be ignored in determining whether a word is a key word. To be more general, determining whether a word is a key word is a changing criteria your program should be able to cope with.

Coding Details

You should create a Model and a View for this program. Use code from previous assignments as patterns on which you base your code. You should have menu items for reading a list of words to ignore when making the concordance and for reading a file for which a concordance will be generated (and displayed in the view).

You'll need at least three classes: the model, the view, and a main class to launch them.


Last modified: Wed Nov 29 11:31:34 EST 2006