Anagrams, CPS 100, Fall 1997

Due Date: Early bonus: Friday, September 12
On-time Date: Monday, September 15

This assignment provides practice with combining classes and code into one program; uses overloaded operators; provides an introduction to lists and to sorting

Table of Contents

[ Introduction | Input/Output | Coding/Algorithm | Grading | Submitting | Extra ]


Supplied Files

A Makefile and sample input files are accessible in ~ola/cps100/anagram on the acpub system. Be sure to create a subdirectory anagram for this problem and to set the permissions for access by prof/uta/ta by typing

fs setacl anagram ola:cps100 read.

For users outside the acpub system.

Introduction

Two words are anagrams if they are composed of the same letters. For example, "bagel" and "gable" are anagrams as are "drainage" and "gardenia". In this assignment you'll write a program that reads a dictionary (a sorted list of words) and finds all the words that are anagrams of each other.

Input/Output

Your program should read a file of words separated by whitespace. You can assume the words are unique (one occurrence in the file) and sorted. Two example files are provided, you may want to create smaller examples to test your program. The examples files are in ~ola/data on the acpub system as words5 and words4-8. There is also a file words4. Don't copy these files, you can link to them:
           ln -s ~ola/data/words5 words5

The output should be a sequence of lines, each line contains words that are anagrams. For example:

begin being binge caret carte cater crate trace argon groan organ

When using the List class, the output can be generated using the List::Print function, so parentheses are acceptable:

( begin being binge ) ( caret carte cater crate trace ) ( argon groan organ )

Coding and Algorithm

You must use the class Anaword whose declaration is in the file anaword.h; you will need to write the implementation of this class in the file anaword.cc although this has been started for you.

An Anaword object is constructed from a string, and prints as the string, but is compared using a normalized or canonical form created by sorting the string. For example, the code fragment below should print the two lines of output shown.

Anaword a("bagel"); Anaword b("gable"); cout << a << " " << b << endl; if (a == b) cout << "they're ananagrams!" << endl; Output as shown:
 bagel gable
 they're anagrams!
The objects a and b are equal because the operator == is overloaded for Anaword objects and uses the sorted or normalized form of the word for comparison. The normalized form of "bagel" and "gable" is "abegl", the sorted version of each word.

In an Anaword object, the private instance variable mySortedWord should store the sorted version of the string in myWord, e.g., "abegl" when myWord is "bagel".

You must implement the member functions described in anaword.h so that the real word (e.g., "bagel") is used for printing, but the normalized or sorted word is used for comparison using == and <.

You'll need to implement the function Anaword::Normalize, and write implementations for overloaded operators != and <=. You can implement all overloaded boolean operators, e.g., !=, using Anaword::Equal, Anaword::Less, or operators you've already implemented, e.g., operator ==.

Algorithm

You're given code that reads all the words in a file, and stores them in a vector of Anawords. Storing Anaword objects uses the Vector::append function which automatically grows the vector. When using append, the function Vector::size returns the number of elements stored in a vector, which is not the same as the vector's capacity, since the vector grows to accomodate new elements as needed.

The templated function QuickSort is called, but won't compile until you implement <= for Anaword objects. Since the canonical form of a word is used for comparisons, all anagrams will be adjacent in the vector of Anawords.

After sorting, all anagrams will be adjacent to each other, but there will be lots of singleton words that aren't anagrams of anything. You need to process the elements of list, the vector of Anaword objects, and print all the anagrams. To exceed expecations (see below), you should write code that creates a vector of lists of Anaword objects, where each element of the new vector is a list of anagrams:

Vector<List<Anaword> > analist;

Note the space in the definition between the two > symbols, this is needed so the compiler won't be confused. The class List is described in the Tapestry book in section 6.7, beginning on page 298. Although you're free to use all the List functions, it's possible to write the program using only List::Append, List::Clear, and List::Print. The basic idea in creating analist is described below:

   create a List object temp, initially empty
   loop over list, the vector of Anaword objects,
   looking for runs of equal Anaword objects, these are anagrams:
     
      add runs of equal words to temp.  When done with a run,
      if the number of elements in temp is greater than one,
      add temp to analist, the vector of lists of Anawords
     
The basic idea is diagrammed below:
diagram

When processing Anawords for "agree" and "eager", these will be added to the list temp since the words are equal (they have the same canonical form: "aeegr"). The Anaword "heave" is processed next. Since it's different from "eager", the previous run is done, and the list temp is added to analist since it has more than one element in it. Then temp is cleared and "heave" is added to it, in preparation for starting to process the next (potential) run. When "lease" is processed next, it's different from "heave", and the list with "heave" in it has only one element, so it is not added to analist. However, temp is cleared, "lease" is added to it, and then the next element, "easel" is processed.

Note that you'll need to erase all elements in temp when done processing a run.

Grading

Expectations are that you will implement the Anaword class and write code that prints all anagrams as described above. To sort strings in Anaword::Normalize you should use selection sort.

You can choose NOT to use the list class at all, but write code that prints all anagrams simply by processing the words in the vector list in doana.cc. However, you'll earn a maximum of 8/10 if you don't use the List class -- using the List class exceeds expectations.

This assignment is worth 10 points, it is a minor assignment. You will receive 8/10 for a program that meets all expectations reasonably well. Style of code will count for 2/10 of the points, correctness for 5/10, exceeding expectations, 2/10, and the README file 1/10.

Submitting

You should create a README file for this and all assignments. All README files should include your name as well as the name(s) of anyone with whom you collaborated on the assignment and the amount of time you spent. In addition, you should write any comments you have about the assignment, what you liked and disliked. For this assignment you should also include your favorite anagrams in the README file.

To submit your assignment, type:

submit100 anagram README *.cc *.h Makefile Be sure to submit all source files as shown and your Makefile.

Extra Credit

For extra credit you should implement Anaword using another method described here. You should time both implementations and write up your findings in your README file. You should take care to have enough data from running the code to backup claims you make about the two methods.

Modify the class Anaword so that the private string variables are pointers to strings rather than strings:

class Anaword { // from before private: void Normalize(); // helper function, sorts string * myWord; // regular string: "bagel" string * mySortedWord; // sorted form: "abegl" };

You should do this only after you've implemented the original version of the program and timed it in finding all the anagrams in the list of all 21,000+ words in the file words4-8. Then you should time the new pointer-based version of the program and compare the times. To submit your assignment, type:

submit100 anagram.xtra README *.cc *.h Makefile
Owen L. Astrachan
Last modified: Tue Sep 2 08:29:42 EDT 1997