APCS Workshop, Second Day

Introduction

Today you'll explore reading and writing from streams, use of the apstring and apvector member functions, pointers, structs, and classes. You'll also use a class that helps measure execution times of different code segments.

Classes and libraries used include:

Streams, strings, timing

We'll discuss two programs as a springboard to the syntax and semantics of C++. These two programs are:

Sample runs of each program are shown below, user-entered input is in italics.

Only the last part of the run of words.cpp is shown below, the output is lengthy (all the words in a data file).

  prompt> enter name of file: data\poe.txt
   ...

   1       rampart
   1       half
   1       century
   1       mortal
   1       disturbed
   1       pace
   1       requiescat

   # of words = 809
   total time for update = 1.3

Here is a run of cdsum.cpp on a file of Van Morrison's The Best of Van Morrison. The first three lines of the file vanmor.dat are

3:46    Bright Side Of The Road
2:36    Gloria
4:31    Moondance

A run of the program is shown below:

   enter name of data file: data\vanmor.dat
   0:03:46     Bright Side Of The Road
   0:02:36     Gloria
   0:04:31     Moondance
   0:02:40     Baby Please Don't Go
   0:04:19     Have I Told You Lately
   0:03:04     Brown Eyed Girl
   0:04:21     Sweet Thing
   0:03:22     Warm Love
   0:03:57     Wonderful Remark
   0:02:57     Jackie Wilson Said (I'm In Heaven When You Smile)
   0:03:14     Full Force gail
   0:04:28     And It Stoned Me
   0:02:46     Here Comes The Night
   0:03:04     Domino
   0:04:05     Did Ye Get Healed
   0:03:32     Wild Night
   0:04:40     Cleaning Windows
   0:04:54     Whenever God Shines His Light
   0:04:54     Queen Of The Slipstream
   0:04:44     Dweller On The Threshold
   ------------------------------
   total = 1:15:54

Group Activity

  1. Create two constructors for the WordStat struct in words.cpp: a default constructor, and a constructor that takes one string as an argument, initializes the info field with the string, and sets the count field to 1 --- you can use default parameters for the count field if you want to, but it should be possible to define:
              WordStat ws("hello");
              WordStat empty;
    
    Indicate what changes this engenders in the program

  2. What changes are needed to use binary search instead of sequential search in words.cpp

  3. show-me Design a struct named CDinfo to hold both the time of a CD-track, the number of the track (1, 2, ...), and the title of the track. Then read information from a file (as in cdsum.cpp), but store it in a array/apvector of structs, and print the elements sorted by track time, shortest track first. Write a prototype for a function to sort a vector of CDinfo structs, but don't write the function itself, just call it. Do, however, show the definition of the struct, an array of structs, and the modified loop from cdsum.cpp to read information into the array.

  4. Write a member function Less with prototype below, and use it to implement operator < for ClockTime objects.
       bool ClockTime::Less(const ClockTime & rhs)
       // postcondition: returns true if *this < rhs, otherwise returns false
    
  5. Implement operator == for ClockTime objects by creating a member function Equal similar to the function Less in the previous exercise.

Lab Activity

The goal in the morning lab is to get more practice with structs, vectors, and using classes. There are several parts of the lab, you probably won't finish all of them in the morning, but the afternoon session continues with activities related to this one.

  1. Modify the program cdsum.cpp by adding some or all of the features below.

    • Read information into an array of CDinfo structs (see the group exercise). Print the information in the array after reading.

    • Sort the CDs read by length of track. Do this by overloading operator < for CDinfo objects (implement member function Less as discussed in the group exercises).

    • Implement a Shuffle function that randomly permutes the CDinfo tracks stored in an array. Print the contents before and after shuffling.

  2. Modify words.cpp to use binary search and shift elements as needed to add words in sorted order rather than at the end of the vector. Time how long this method of word-search takes and compare it to the sequential search version. Move the calls of StripPunc and ToLower into the Update function.

Lunch

Group Activity

The goal of the afternoon session is to practice with class design, operator overloading, and using classes and streams. The program wordify.cpp will be used as the basis around which a new class will be built and an existing class modified --- it's a class-version of the program words.cpp used in the morning session.

Struct to Class

The first activity will be to turn the struct WordStat into a class. At first, add member functions to the struct, but leave the data public. Eventually you'll move the struct to a full class with public functions, and private data.

The WordStat struct is used in the functions WordList::Update and WordList::Search. In both functions different fields of the struct are accessed directly, either to compare info to a string or to set/increment the count field.

  1. Create two constructors for the WordStat struct. (see the morning exercise --- this is a repeat).

  2. Create three WordStat member functions:
    1. Print takes on ostream parameter and writes the WordStat object to the stream.

    2. Equal takes a WordStat object as a (const reference) parameter named rhs and returns true iff rhs and *this are equal, where only info fields are checked to determine equality.

    3. show-me: Less is the analog of of equal but returns true iff *this is less than parameter rhs.

  3. Now overload all comparison operators: <, >, ==, <= >=, and != using the member functions Equal and Less. You should implement the four operators below as show-me operators, and think about the others.

    • bool operator == (const WordStat & lhs, const WordStat & rhs)
    • bool operator < (const WordStat & lhs, const WordStat & rhs)
    • bool operator > (const WordStat & lhs, const WordStat & rhs)

    You should also implement (or think about) an overloaded insertion operator <<.

  4. Write a parameterless member function Increment that bumps the value of count by 1.

Explain how to modify the code in WordList::Search to take advantage of the overloaded operators so as not to access any info fields directly --- you should think about overloading operator == and operator < (and others if you see fit).

Similarly, what modifications are needed in WordList::Update and WordList::Print to use overloaded member functions and not make any direct access to fields info and count.

Lab Activity

In the lab you'll make modifications to the program wordify.cpp based on the group exercises. There may not be time to do all the modification.

The first exercise is to implement the changes discussed in the group activities so that WordStat is a class with member functions instead of a struct. You'll need to implement two constructors, overloaded comparison operators (using member functions Less and Equal), printing functions/operators, and an Increment function.

Make sure that data is private, this will ensure that you can't access it in any of the WordList member functions --- you'll be forced to use WordStat member functions to access/alter the values stored in WordList::myList.

When you've made these changes, time the program on data/poe.txt and (if machines have enough memory) on data/hamlet.txt. Write the times down since you'll be making changes to the program and re-timing.

Make a copy of the working program so that you can go back to it if you need to, then begin converting wordify.cpp to use pointers as described below.

Pointers to Structs

Much of the time in the WordList::Update function is spent shifting elements. Each shift requires a string and an integer to be copied from one array cell to another. You'll make shifting faster by using pointers to WordStat objects rather than the objects themselves. Change the declaration of myList in the class WordList to apvector<WordStat *> myList; You should initialize all values in myList to NULL. You can do this with a a loop in the body of the constructor or (at least on some compilers) by using the definition myList(size,NULL) in the initializer list.

Now you'll need to search for uses of myList[...] and change them as appropriate. For example, in the original version of wordify.cpp you'll find (in WordList::Search)

if (myList[mid].info == key)

You've probably changed this to what's shown below to take advantage of overloaded operator ==.

if (myList[mid] == key)

Now you'll need to write

if (*myList[mid] == key) to dereference the pointer in myList[mid] and get to the object it points to. You'll make similar changes elsewhere, but do not change the assignment in the loop of WordList::Update myList[loc+1] = myList[loc]; this stays the same to copy pointers rather than objects. When your program works time it again on the same data files you used originally.

Tracking Line Numbers

Instead of tracking just the number of occurrences, you'll modify the program (either the original wordify.cpp or the pointer-modified version) to track the line numbers on which each word occurs. This will require modifying the WordStat class and using string streams to read. Note: string streams are NOT part of the APCS C++ subset. However, it's useful to know how to read line oriented data.

To read lines at a time, you'll replace the reading loop test

while (input >> word) // reading word succeeded with the test while (getline(input,word)) // reading word succeeded When the getline function succeeds, one line of text is read into the string word. To read strings from the word, you could use apstring::find to search for spaces, but it's easier to use a string stream.

You access the class istrstream (input string stream) by including the header file <strstream.h>. Then you make the modifications shown below to the reading loop from wordify.cpp.

while (getline(input,word)) { istrstream sstream(word.c_str()); string s; while (sstream >> s) { // process s, i.e., tolower, strippunc, call UpDate } } You must define the variable sstream inside the loop so that it's constructed each time the outer loop iterates.

You'll also need to modify the WordStat class to track line numbers. Add an array/apvector of integers (and a count of the number of values stored in it) and update the Print function to print the array and the Increment function to take a line number as a parameter. Don't store the same line number more than once --- if the word "the" occurs twice on line 82, then 82 should appear only once in the vector of line numbers associated with "the".

Linked Lists

This part of the lab is aimed at the AB course. If you're not comfortable with linked lists you should work on another problem. One is suggested at the end of the lab.

Instead of using an array of pointers, you'll make modifications so that a linked list of WordStat objects is used. You can do this in two ways: make the WordStat class back into a struct (public data) and add a new field WordStat * next. However, it would be better to create a new struct Node as shown below.

struct Node { WordStat info; Node * next; Node (const WordStat & ws, Node * follow = NULL) : info(ws), next(follow) {} }; This struct can be declared within the private section of the WordList class, or outside the class. Then you should use Node * myFirst; // points to first node Node * myLast; // points to last node in place of the private WordList::myList variable. Create a dummy/header node in the WordList constructor. You'll need to make many changes in the code to deal with linked lists. Searching for values requires changing WordList::Search. In WordList::Update you'll need to add new nodes (at the end of the list) using statements similar to: myLast->next = new Node(WordStat(word)); myLast = myLast->next;

When you've got this working, try the experiments below.

  1. Each time a word is found (e.g., in WordList::Update), move the node to the front of the list. The idea here is that frequently accessed words will be moved to the front of the list. You'll need to think about the best way to unlink a node i.e., you could alter WordList::Search to return two values: a pointer to a node and a pointer to the node before, or you could find the node before with a loop if the search is successful.

    Time this program --- if it's faster why? if not, why not?

  2. Instead of moving a node all the way to the front, move it one place closer to the front. Time again.

  3. Instead of adding new nodes at the end of the list, add new nodes at the front of the list. Time again.

Extra Activities

Key Word In Context This problem is (extensively) described in a document about key-word-in-context (KWIC). A large portion of the design of this problem is provided for you. The idea is to get used to multi-class programs so you get a feel for object oriented (class-based) design. You won't finish this program in one day, we'll discuss the program and the design on the last day of the workshop.
Owen L. Astrachan
Last modified: Thu Jun 12 09:51:18 EDT