Sorting sorts, CPS 100, Spring 1996

Introduction

This assignment is mostly an empirical study of different sorting algorithms whose implementations are given to you. Most of the assignment involves analyzing the algorithms on different kinds of input and making conclusions supported by the evidence you glean from your studies.

You should work in the same group for this assignment that you are working with for the tree assignment. This assignment is very amenable to group work since much of it involves gathering data.

The files for this assignment can be found in ~ola/cps100/sort.

About sortall.cc

The program sortall.cc uses templated functions to sort. Any class/type that supports the standard comparison operations: <=, >=, ==, !=, >, < and both input and output operations using streams can be sorted (I/O isn't necessary to sort). There are four parts to this assignment

The first two parts are worth six points each, the second two are worth four points each.

Analyzing O(n^2) Sorts

There are three O(n^2) sorts implemented: selection, insertion, and bubble.

You are to run the program and time how long it takes to use selection sort, insertion sort, and bubble sort for arrays of size 500 to 5,000 in increments of 500. You will then graph this data using the gnuplot graphing program which will generate plots like the one below: (you can use other plotting packages if you know how to use them). You should probably think about how to minimize your time in generating data --- some thought about how to organize the runs of the program will save time.

In order to use gnuplot you must set up data files as pairs of x,y coordinates. If selection sort of 500 ints takes 0.523 seconds and of 1000 ints takes 0.9873 seconds then a data file called selectint.data should be created whose first two lines are:

       500    0.523
       1000   0.9873

For this part of the assignment you will create many (nine) data files, three for selection sort, three for insertion sort, and three for bubble sort;, for each sort you'll sort three different kinds of element. Each file will contain 10 lines of data with each line consisting of a number of array elements (500 --- 5,000) and the time to sort the array of that many elements as described above. Name these files selectint.data, selectstr.data, and so on depending on whether you're sorting ints or strings. The files are automatically generated by the sorting classes, you supply a file name when the sorting-class variable is constructed.


Using gnuplot

To use gnuplot just type {gnuplot at the UNIX prompt. Now you should see the gnuplot prompt: gnuplot> rather than your normal prompt. You should then type the following line which will graph the three sets of data named by the files in quotes:
 plot "insertint.data" with linespoints 1 1, \
 "selectint.data" with linespoints 3 3, "bubbleint.data" with linespoints 5 5
(the number-pairs 1 1 and 3 3 indicate the style of line and point to use). You can add another file by using a comma after the "5 5", specifying bubblestr.data, and specifying "linespoints 2 2" for example. Note that the backslash \ can be used to continue typing a long line on multiple lines (but gnuplot treats this as one line). This command should generate a plot on the screen. Then you should type gnuplot> set ylabel "time (seconds)" gnuplot> set xlabel "# elements sorted" gnuplot> replot which should re-generate the plot with the x-axis and y-axis labeled.

Finally you will create a version of the plot to print by typing

gnuplot> set terminal postscript gnuplot> set output "squareplot.ps" gnuplot> replot

You can then print the plot by typing lpr squareplot.ps or print squareplot.ps Note that to quit gnuplot you simply type "quit".


String Sorting

You should also sort "smart string pointers" as well as vectors of integers and strings. Make a plot of this data as well. A smart pointer is one that knows how to compare itself using the value pointed to (in this case a string). Define a new struct/class that uses a pointer to a string rather than the string, for example: struct SmartStrPtr { string * info; }; bool operator < (const SmartStrPtr & s, const SmartStrPtr & t) // postcondition: return true if s < t (as strings), otherwise false { return *(s.info) < *(t.info); } // also define <=, >, and other needed operators

These are "smart" pointers because as pointers they require less time to swap/move since only pointers are moved rather than entire strings being re-copied.


Faster Sorts

There are several faster sorts. Most of them are discussed in Weiss and we will go over some of them in class. In doing this part of the assignment you must analyze two sorts as described below. You must also re-implement the partition Pivot function. You can, for extra credit (4 points) analyze shellsort and bucket or radix sort (these latter two only work with ints without lots of modifications).

Timing the faster sorts

After implementing all these faster sorts you are to run the program sortall so that it sorts arrays of size 10000, 20000, ... 100,000 using these faster sorts (but NOT using O(n^2) sorts which will take too long). You can either plot the data, or include the data as part of your README file showing how long each sort takes for ints, strings, and smart string pointers.

Range of Numbers and Pivot Function After creating the table/data above you should run the program again but constrain the range of numbers used in the sorts to be less than 100 by invoking the program via sortall 100. Summarize any significant changes in the runtimes of the different sorts when the range of number is constrained as compared to when the range is larger (recall that the default range is less than 10,000). Give a brief explanation as to why changes occur or no changes occur.

You should then change the function Pivot used by quicksort so that rather than split the array into two sections: one less than or equal to the pivot and one greater; the array is split into three sections: one less than the pivot, one equal to the pivot, and one greater than the pivot. The section equal to the pivot does NOT need to be recursively sorted. To do this you may need to change the parameters of the function Pivot. Re-run for JUST quicksort using your changed pivot function; be sure to account for these results in your README file. You will probably want to implement a CheckSorted function to determine if your new quick sort is working. It will be difficult to earn full credit without implementing such a function. CheckSort might, for example, take two vectors and decide if one is a sorted version of the other one.

BucketSort

Bucket sort works when sorting integers in a limited range (and, on computers, all integers are in a limited range.) In the routine BucketSort this range is specified by the additional parameter radix as noted in the comments of the routine. (This means that you CANNOT use BucketSort with the class SortBench as the class is written since the BucketSort function doesn't have the right signature/prototype.) For example, if all the numbers being sorted are in the range 0--9 (the value of radix would be 10), then the diagram below shows how "bucket" counts are determined from an array and then used to "sort" the array.

Note that the count in each bucket indicates how many occurrences of each number appear in the original array and can be used in a straightforward manner to "store" numbers in the sorted array. The numbers are not being re-arranged as with other sorts, the count array is used to generate an array that has the same number of occurrences of each number that appeared in the original array.

When sortall is invoked it interprets any argument as the radix used to determine the range of numbers (see the main routine). For example, sortall 1000 indicates that all numbers will be in the range 0--999. The default radix is 10,000.

Shell Sort

Shell sort is described in Weiss. The basic idea is to do a sequence of insertion sorts, but to ``look'' at elements that are far apart. In insertion sort an element is inserted into its proper position relative to all other elements by examining all other elements. In shell sort, an element might be inserted into the proper position relative to every 100th element rather than every element. Then elements are inserted into proper position relative to every 50th element, every 25th element, and so on until at the last stage of shell sort a regular insertion sort takes place. Because many elements are moved before this final stage, the sort is much more efficient than insertion sort. There are many more details of this algorithm in Weiss. The increments used in this version of shell sort are described as {\em Hibbard's} increments, they are of the form: 1, 3, 7, ... , 2^k - 1.


What to turn in

You should submit your modified program sortall.cc and a README file containing the analysis of the routines as described above. Submit these using

submit100 sort sortall.cc README you should also hand in the plot of the n^2 sorts created using gnuplot. You may want to hand in a complete report rather than a README file, feel free to go completely overboard and earn corresponding extra points.