CPS 100, Spring 1997, Anagrams

Due Date: Early bonus: Monday, January 27, 8:00 am
Final Due Date: Friday, January 31, 8:00 am

This assignment will provide practice with pointers, classes, sorting, overloaded operators, and reasoning about alternative implementations.

Table of Contents

[ Supplied Files | Introduction | Input/Output | Coding/Algorithm | Grading | Submitting | Extra ]


Supplied Files

(A Makefile and sample input files are accessible in ~ola/cps100/anagram on the acpub system. Be sure to create a subdirectory anagram for this problem and to set the permissions for access by prof/uta/ta by typing fs setacl anagram ola:cps100 read.)

For users outside the acpub system.

  1. Makefile (site specific)
  2. anaword.h
  3. anaword.cc (skeleton)
  4. anafind.cc (skeleton)
  5. words5 (4,176 five letter words from Linux /usr/dict/words)
  6. words (45,402 words from Linux /usr/dict/words)

Introduction

Two words are anagrams if they are composed of the same letters. For example, "bagel" and "gable" are anagrams as are "drainage" and "gardenia". In this assignments you'll write a program that reads a dictionary (a sorted list of words) and finds all the words that are anagrams of each other.

Input/Output

Your program should read a file of words separated by whitespace. You can assume the words are unique (one occurrence in the file) and sorted. Two example files are provided, you may want to create smaller examples to test your program.

The output should be a sequence of lines, each line contains words that are anagrams. For example:

begin being binge caret carte cater crate trace argon groan organ

Coding and Algorithm

You must use the class Anaword whose declaration is shown below and in the file anaword.h; you will need to write the implementation of this class in the file anaword.cc although this has been started for you. #ifndef _ANAWORD_H #define _ANAWORD_H #include <iostream.h> #include "CPstring.h" // class designed to facilitate finding Anagrams // written for CPS 100, 1/16/1997 // // an Anaword object prints as a regular string, but // compares as a sorted string // // Example: the Anaword version of the string "bagel" // prints as bagle, but will be compared with // other Anawords as the sring "abegl", the sorted // version of "bagel". This means that the Anaword // version of "gable" is equal to the Anaword version // of "bagel" // // operations: // // Anaword(const string & word) -- construct from a string // // bool Equal(const Anaword & rhs) -- oompare rhs for equality // bool operator == (lhs, rhs) -- compare Anawords lhs == rhs // // bool Less(const Anaword & rhs) -- compare rhs for inequality < // bool operator < (lhs,rhs) -- compare Anawords lhs < rhs // // void Print(const Anaword & lhs) -- print anaword (unsorted) // ostream & << operator(ostream, -- print using << // Anaword) class Anaword { public: Anaword(const string & word); // construct from string bool Equal(const Anaword & rhs) const; // compare for == bool Less(const Anaword & rhs) const; // compare for < void Print(ostream & out) const; // print (sorted form) private: void Normalize(); // helper function, sorts string myWord; // regular string: "bagel" string mySortedWord; // sorted form: "abegl" }; bool operator == (const Anaword & lhs, const Anaword & rhs); bool operator < (const Anaword & lhs, const Anaword & rhs); ostream & operator << (ostream & out, const Anaword & a); #endif An Anaword object is constructed from a string, and prints as the string, but is compared using a normalized or canonical form created by sorting the string. For example, the code fragment below prints the two lines of output shown. Anaword a("bagel"); Anaword b("gable"); cout << a << " " << b << endl; if (a == b) cout << "they're ananagrams!" << endl; Output as shown:
 bagel gable
 they're anagrams!
The objects a and b are equal because the operator == is overloaded for Anaword objects and uses a sorted form of a word for comparison. The normalized form of "bagel" and "gable" is "abegl", the sorted version of each word.

You must implement the member functions described in anaword.h so that the real word (e.g., "bagel") is used for printing, but the normalized or sorted word is used for comparison using == and <.

Algorithm

You should read all the words in a file whose name is entered by the user. Each word should be used to construct an Anaword object using new, you'll create a vector of pointers to Anaword objects to use in your program. You must use a vector of pointers because the Anaword class doesn't have a default constructor, so it's not possible to create Vector<Anaword> a(100), for example.

The three lines below define a vector of pointere and store an Anaword object representing the string "bagel" in the first vector entry. You'll need to do something similar for every word in the file (the vector you use should grow as necessary).

Vector<Anaword *> list(100); string s = "bagel"; list[0] = new Anaword(s);

After reading all the words and creating a vector you should sort the vector. You'll need to compare Anawords using < and ==; this will require dereferencing pointers to get at the Anawords, e.g., if (*(list[0]) == *(list[1])).

After sorting, all anagrams will be adjacent to each other, but there will be lots of singleton words that aren't anagrams of anything. You should remove all singleton words leaving only anagrams. If the original sorted vector has N elements, your code should remove all singletons, leaving only anagrams, in O(N) or linear time.

Once the vector has only anagrams, you can print the anagrams, one set per line.

Grading

Expectations are that you will implement the Anaword class and write code that prints all anagrams as described above. In addition, your program, anafind.cc, should use functions so that the body of main is small. To sort strings and Anaword objects you should use selection or insertion sort.

To exceed expectations you can do several things, two outlined below (be creative.)

This assignment is worth 20 points, it is a minor assignment. You will receive 16/20 for a program that meets all expectations reasonably well. Style of code will count for 5/20 of the points.

Submitting

You should create a README file for this and all assignments. All README files should include your name as well as the name(s) of anyone with whom you collaborated on the assignment and the amount of time you spent. In addition, you should write any comments you have about the assignment, what you liked and disliked. For this assignment you should also include your favorite anagrams in the README file.

To submit your assignment, type:

submit100 anagram README *.cc *.h Makefile Be sure to submit all source files as shown and your Makefile.

Extra Credit

For extra credit you should implement Anaword using another method described here. You should time both implementations and write up your findings in your README file. You should take care to have enough data from running the code to backup claims you make about the two methods.

In the class Anaword, instead of normalizing by sorting each word, you should create a histogram of the number of occurrences of each letter in the word. You'll need a vector of 26 ints. Initialize each element to 0, then keep track of how many a's, b's, ... z's there are in a word. For example, the word "cabbage" has two a's, two b's, and one c, g, e. If the histograms if two words are equal (requires 26 comparisons to determine) the words are anagrams. To submit your assignment, type:

submit100 anagram.xtra README *.cc *.h Makefile
Owen L. Astrachan
Last modified: Fri Jan 17 11:21:36 EST