Please submit a compressed tar file containing all source code and a README. You should include a design document, either in the submission or hardcopy you turn in (for this assignment there's no folder/binder of stuff to turn in).
submit108 woofii woofii.tgzYou must write a program that reads a textfile and records the different words, and generates as output a list of words and the line numbers on which each word occurs.
You can invoke the program in different ways:
The program will be graded 40% on design/clarity, 30% on correctness, and 30% on speed. Benchmark times for two text files will be posted, you must beat the benchmark times to earn more than 10% of the 30% for speed points. It's likely that to beat the time you'll need to use an STL map (or better/equivalent in terms of efficiency) and C-style strings (and maybe C-style I/O), so you should design your program to make it easy to incorporate changes in I/O.
Words are white-space delimited alphanumeric characters with leading/trailing punctuation removed (internal punctuation is ok.) Letters should be converted to lowercase.
Please submit a compressed tar file containing all source code and a README. You should include a design document, either in the submission or hardcopy you turn in (for this assignment there's no folder/binder of stuff to turn in).
submit108 kwic kwic.tgzA Key Word in Context index is useful in looking up titles, words, and other things. Words that aren't key words are ignored in generating a KWIC index. For example, if words to ignore are `` the, of, and, as, a '' and a list of titles is:
Descent of Man The Ascent of Man The Old Man and The Sea A Portrait of The Artist As a Young Man
A KWIC-index of these titles might be given by:
a portrait of the ARTIST as a young man
the ASCENT of man
DESCENT of man
descent of MAN
the ascent of MAN
the old MAN and the sea
a portrait of the artist as a young MAN
the OLD man and the sea
a PORTRAIT of the artist as a young man
the old man and the SEA
a portrait of the artist as a YOUNG man
A concordance is similar to a KWIC index. For example, two lines from the online copy of the bible are reproduced below. Assume that these are lines 10 and 11.
And the earth was without form, and void; and darkness was upon the face of the deep.These lines might generate a concordance as follows:
void and DARKNESS was upon the 10-11
of the DEEP 11
and the EARTH was without form 10
upon the FACE of the deep 11
was without FORM and void and 10
darkness was UPON the face of 10-11
form and VOID and darkness was 10
earth was WITHOUT form and void 10
In this concordance, words of fewer than three letters are not
considered as Key Words, and aren't listed in the concordance.
Write a program that generates a concordance from a textfile. Storage
efficiency is an important consideration in designing your program, but
the correctness and design of the program are more important. Storage
efficiency is much more important than speed (though programs that are
unbelievably slow will lose some points.)
The output should be sorted by keyword, with keywords capitalized and other words in lower case. Punctuation not internal to words should be ignored.
The context for each key word is two words before the key word and three words after the key word. Ideally these values will be configurable in your program.
Your program should include the option of allowing a file of words to be read, these words will be ignored in determining whether a word is a key word. To be more general, determining whether a word is a key word is a changing criteria your program should be able to cope with.