APCS Workshop, Fourth Day

Introduction

In this session you'll discuss how to implement templated classes and templated functions and how these are instantiated. The template instantiation process differs between compilers although there is one sure-fire method that works on all compilers with single-file projects, that is projects with only one file that instantiates a templated class. This method won't work with some compilers if the same templated class is instantiated in different files in the same project. We'll go over how templates work in some detail.

As a start, you'll look at the implementation of the templated stack class in apstack.cpp. You'll also study, in the afternoon, a program for benchmarking different sorts that uses several templates --- program sortall.cpp.

We'll look at the BigInt class used in conjunction with other AP classes. We'll start looking at the class apstack with a program postfixi.cpp for calculating integer postfix expressions. A run is shown below:

    prompt>  postfixi 
     12 8 * 22 + 
    result = 118
    99999 99999 * 
    result = 1409865409

(why can't the last calculation be correct?)

Discuss what changes are needed to turn postfixi.cpp into a program that evaluates BigInt postfix expressions (called, for example, postfixb.cpp.)
Discuss what modifications are needed to implement a postfix exponentiation operator using ^ (for both the int and the BigInt version of the program.)
Use an apqueue variable to store all operands and operators so that they can be echoed when the result is printed. To store both numbers and characters, convert to apstring variables and use a queue of strings. For conversion from int to string (and vice versa) see the library of functions specified in strutils.h.
show-me Write a parameterless apstack member function swap that interchanges the top two elements on a stack. Don't reference any private variables in swap.

Lab Activity

In the lab activity you'll use the apstack, apqueue, apmatrix, and BigInt classes -- but you may not get to all these exercises.

Copy postfixi.cpp to a file postfixb.cpp, modify it to use BigInt values rather than int values, and add the queue discussed in the group exercises to store the expression entered and echo it back when the result is printed.
The program bigfib.cpp computes Fibonacci numbers using integers although the result is assigned to a BigInt variable. The output is not correct as you can see from run below (which also shows the number of seconds needed to compute the result):
```
prompt> bigfib
min fib:  between 1 and 10000: 40
max fib:  between 41 and 10000: 50
k        Fib(k)         secs.

40      165580141       0
41      267914296       0
42      433494437       0
43      701408733       0
44      1134903170      0
45      1836311903      0
46      -1323752223     0
47      512559680       0
48      -811192543      0
49      -298632863      0
50      -1109825406     0
```
In two stages, you'll turn the function Fib into a templated function that returns a BigInt value. First make the Fib a void function that returns result via a reference parameter. Compile and test this version of the program to make sure it's correct.
In the second stage of the revision, make the function templated as shown below for the header:
template <class Number> void Fib(Number & result, int n) { // body here } You'll need to change the types of local variables one and two to be of type Number. After you've templated the function, call it twice from main: once with an int as the first parameter and once with a BigInt as the first parameter. Since the function is templated, the "right" one will be called automatically., It was necessary to make the function void with a reference parameter since an overloaded (and in this case templated) function must have different parameter lists.
When you have the function working, find the 5,000 Fibonacci number and how many digits it has.
Write a program to print rows 1-N of Pascal's triangle, where N is a user-entered value. You should calculate the values in the triangle in two ways, and see which method is quickest:
- Use the "standard" formula for n choose k that gives the k-th value in the n-th row of Pascal's triangle (0 <= k <= n).
```
     ( n )      n!
     (   )  = --------
     ( k )    k! (n-k)!
```
- Use an N x N matrix p of BigInt values where p[n][k] is the k-th value in the n-th row of Pascal's triangle:
```
    p[n][k] = p[n-1][k-1] + p[n-1][k]
```
  where each entry in the triangle is the sum of the entries "above" it. This means that if you initialize the 0-th and k-th entries in the k-th row (to 1) then all values can be computed by summing values in row k-1.
- Re-implement the matrix version of Pascal's triangle from the previous problem, but instead of using a matrix (in which roughly half the entries are wasted) use a ragged vector. Declare a vector p as follows: so that p is a vector of N elements, where each element of p is a vector (note the space between the > > --- this is needed to avoid confusion with the extraction operator >>.
  You can size each value of p using p[k].resize[k+1), for example. Once you've created p, your code from the previous problem should be able to fill in the ragged vector just as it did for the matrix.

Lunch

Group Activity

As a group, think about how to design a class that implements mathematical sets --- collections of elements, all of the same kind, in which no element appears twice. As you're designing the class, you might think about how it can be extended to multisets (sometimes called bags in Computer Science) in which elements can appear more than once. For this work, don't worry about the efficiency of the implementation, although you'll be designing a class that facilitates different implementations.

Your set class should allow an empty set to be constructed. We'll discuss why it's dangerous to allow construction/initialization from an element as shown for the variable intset below (which shouldn't compile.)

apset<apstring> strset; // an empty set of strings apset<int> intset(3); // illegal definition! won't compile

Your set class should have the following member functions (and more if you need them.)

makeEmpty() -- makes the set the empty set
contains(elt) -- returns true iff elt is in the set
print() -- prints all elements of the set

As a start, your set should support set union using an overloaded + operator and set intersection with an overloaded * operator. Don't forget (see the BigInt class in bigint.h) that it's usually easiest to implement += and *= first, then use these to implement + and *. It should be possible to add a single element to a set using union with the element. For example, the code below creates a set of strings that are all the unique words in a file:

apset<apstring> wordset; ifstream input("some_data_file"); apstring word; while (input >> word) { wordset += word; } wordset.print();

You probably won't be able to implement every member function as part of this exercise, but you should think about the issues below and use them to guide your design (these issues comprise the group exercises).

For this initial design of a class apset assume that elements of the set are stored in a vector in unsorted order.

Think about what private data members you'll need and name them appropriately.
You'll find your design is more extensible if you implement member functions add and find. The first adds an element to the set (precondition: it's not already in the set), the second returns some indication of whether an element is in the set, e.g., an index into a vector, a pointer to a node, etc.
These functions can be private member functions used to facilitate the implementation of the other functions.
Write a templated version of find assuming you're using a vector to store elements in the set:
show-me Write a templated member function add, that adds an item to a set. You'll need to use your private data fields from the first group problem. In implementing add assume that the item being added isn't in the set already (write this as part of the precondition).
Using add (also as part of the show-me exercise) implement the overloaded operator += which unions one set with another --- assume a vector implementation and remember that the private data members of rhs are accessible since this is a member function.
Implement operator *= for intersection of a set with another.
What will change if you use a search tree to implement sets rather than a sorted vector? Think about how you might isolate changes to make different implementations possible (this is more of a discussion question than a code-writing question).

Lab Activity

The lab activity is designed to give you some experience with templated classes. Rather than implement the apset class (we'll do that if time, and continue with it tomorrow) you'll use a program containing LOTS of different sorts, and use them to time and reason about the different sorts. Time permitting you'll also work on an apset class implemented using binary trees (if you don't know about binary trees you can use vectors or linked lists).

The program sortall.cpp has several different, templated sorting functions implemented including:

insertion sort
selection sort
bubble sort
shell sort
merge sort
quick sort

The main program is currently set up to sort using all of these except for shell and quick. It's also set up to sort both ints and strings. You should run the program to see how it works. Note that it creates a separate data file for each sort (ending in .dat). You should be careful between runs since if you don't copy the files, or change the names of the data files in the program, you'll overwrite them each time the program is run.

Set the program up to run all the n^2 sorts, and shell sort, but not the n log n sorts [i.e., run insert, select, bubble, shell, and not Merge and not Quick]. Run in increments of 100 from 100 to 1000 elements. If possible, graph the results in excel. In any case, extrapolate and determine how long it will take to bubble sort a vector of 5,000 strings.

One reason that bubble sort is slow (in addition to its natural canine tendencies) is that it's swap intensive. Swapping strings is expensive since copies of the strings are made when swapping. To help with this, and to practice with overloaded operators, you'll implement a "Smart String Pointer" struct. The beginnings of such a struct are shown below.

struct SmartStrPtr { apstring * info; bool Less(const SmartStrPtr & rhs) const; void Print(ostream & os) const; }; bool SmartStrPtr::Less(const SmartStrPtr & rhs) const // postcondition: return true if *this < rhs, else returns false { return *info < *(rhs.info); } bool operator < (const SmartStrPtr & lhs, const SmartStrPtr & rhs) // postcondition: return true if lhs < rhs (as strings), otherwise false { return lhs.Less(rhs); } // also define <=, >, <<, and other operators as needed

These are "smart" pointers because as pointers they require less time to swap/move since only pointers are moved rather than entire strings being re-copied. The Print function is used to facilitate debugging and for the PrintArray function in sortall.cpp.

You should implement the SmartStrPtr class and plot times for strings compared to smart strings. You should do these for the both O(n^2) sorts and the O(n log n ) sorts. You'll need to write another RandLoad function to create a vector of smart string pointers, you'll probably need to use new to create strings. If you implement the necessary overloaded operators, you should be able to sort vectors of SmartStrPtr objects without modifying any of the sorts, but simply by creating objects in main like:

SortBench<SmartStrPtr> insertInt("insert smartstr", InsertSort,"inssmart.dat");

If you have time, you should experiment with the following:

Implement a RandLoad for BigInt objects and sort vectors of these using the n log n sorts.
Run quicksort and mergesort only for int vectors, and restrict the range of the randomly generated numbers to be between 1 and 300 (instead of 30,000). You can do this by changing the value assigned to radix in main, or by running with a command-line argument that is the radix.
You should notice a large degradation in performance of quick sort. Think about why this is the case, and why re-writing partition to create three segments: one less than the pivot, one equal to the pivot, and one greater than the pivot will help immensely (with time you'd re-implement pivot and quick sort).
Experiment with different cutoffs for the n log n sorts calling Insert --- you could try to automate this, or try a few values to see if the cutoff is useful at all.
"Fix" bubblesort by making it stop if the elements are in order after a pass of the inner-loop (no swaps made). Decide if there's any truth to "you can't fix a broken down dog".

Extra Activities

Prelude to Last Day

On the last day we'll work in lab first, then meet as a group. In lab you'll work on materials for use in workshops, or, if you prefer, on a choice of two assignments: one using queues, vectors, and pointers; the other using several classes and binary trees to implement the "guess an animal" game. The programs can be found with the material for the last day, but you may want to think about workshop material or look over the programs now.

Owen L. Astrachan

Last modified: Thu Jun 12 10:54:37 EDT