Due: Tuesday, Sept. 17 by 11pm
30 points
Word puzzles like the Jumble shown below, and found in many newspapers are based on anagrams too. You can also play jumbles online at jumble.com.
This assignment involves writing code and reasoning about two different methods for using a program to solve jumbles.
prompt> allanagrams datafile gazer graze gases sages heals leash shale heaps phase shape hares hears share shear earth hater heart haste hates heats haves shave arise raise lakes leaks
cd mkdir cd cps100 mkdir anagram cp ~rodger/cps100/anagram/* . ln -s ~rodger/data/words .(The link created on the last line is to a large data file of words, you should use smaller files to test your program.)
Your job is to study this code, understand it, then implement the program allanagrams described above (see step 1 of what you must do). You can write any code/classes you want to, the output of your program should be a list of all the anagrams in a data file read by the program. Each line is a list of words that are anagrams, each line printed contains at least two words (don't print a word that has no anagram other than itself).
You're to implement a new class, HistoNormal that uses a fingerprint/signature/histogram to determine if two strings are anagrams. For example the string "bagel" and "gable" have the same normalized/canonical signature histogram:
1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0which indicates one 'a', one 'b', one 'e', one 'g', and one 'l'. The signature of aardvark, for example, is
3 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0
Since the function Normalizer::normalize must return a string, the histogram should be converted to a string with each count separated from others by a colon (or other character). For example, the normalized form of "bagel" and "gable" would be
1:1:0:0:1:0:1:0:0:0:0:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0
The code in ostrexample.cpp shows how an ostrstream can be used to write to a string like it's a stream. You can find information about the class ostrstream (which should be called ostringstream, but g++ is lagging in the standard) in Section 9.2.3 of Tapestry (pp 413-414).
When normalizing you should convert all characters to lowercase and you should ignore punctuation, this should be done in the Normalizer code, not (necessarily) in client code. The string returned should have 26 counts, one for each of 'a'-'z', and each count should be separated from others by a colon, there should be neither a leading nor a trailing colon, colons should separate numbers.
You should test your class by using it to find all the anagrams using the program allanagrams.cpp you wrote. You can also test it using findana.cpp provided to you, or you can write your own test program. Good test programs can receive extra credit. They must be turned in and described in your README file in order to receive the extra credit.
EXEC = allanagrams SRC_FILES = allanagrams.cpp anaword.cpp sortnormal.cpp histonormal.cpp
If you're compiling using make allanagrams, and you don't
have a file histonormal.cpp, you'll get an error message
as shown below:
make: *** No rule to make target `histonormal.o', needed by `allanagrams'.
make: Target `allanagrams' not remade because of errors.
make: warning: Clock skew detected. Your build may be incomplete.
To fix this problem, either create a file for histonormal.cpp so it will
compile. Alternatively,
remove the word histonormal.cpp from
the Makefile for allanagrams. You'll need to add it back later
when you have a histonormal.cpp.
the theEven more ideally, you won't repeat words as anagrams if they differ only in case, i.e., hello Hello. However, this is a small portion of the overall grade, so it shouldn't be a major concern. Note that using a StringSet object (see Tapestry) is one way of coping with this problem.
Criteria | Points |
---|---|
implementation of HistoNormal | 5 |
method/code in allanagrams.cpp | 15 |
this includes quality of code, quality of algorithm/method used, correctness, comments. | |
README write-up/analysis | 6 |
data files | 4 |
Extra Credit: (great test program) | 2 |
You must also submit two data files as described above. Your README should describe these data files and why each is remarkably better for one normalizer than another.
To submit (your data files can have any name, but the example below uses some name to make it clear you need to submit data files).
submit_cps100 anagram README Makefile *.h *.cpp datafile1 datafile2
If submit_cps100 doesn't work, try ~rodger/bin/submit_cps100.