CPS 100E, Fall 1996 Lab 8

Sorting and Evaluation of Algorithms...

(You may find it easier to read this lab using Netscape or another browser, the URL is http://www.cs.duke.edu/~ola/courses/cps100e/lab/lab8.html)

The goals of this lab include doing the specific tasks outlined below and understanding general concepts behind the tasks.


This lab assignment is mainly an empirical study of different sorting algorithms whose implementations are given to you. Most of this lab involves analyzing the algorithms on different kinds of input and making conclusions supported by the evidence you glean from your studies.

Lab 8 table of contents

[ Introduction to Lab | Compiling/Running sortall | Analyzing/graphing O(n^2) sorts ]

[ O(n log n) sorts | New Quicksort Partition | Submit | Extra Credit ]

Introduction to Lab

The file sortall.cc will be used as a framework around which different sorting algorithms that will be tested in this lab. Currently sortall will do the following:

The program uses templated functions to sort. Any class/type which supports the standard comparision operations. <=, >=, ==, !=, > < can be sorted.

Compiling/Running sortall

In this section of the lab you'll copy files and then compile and run the sortall program.

First change into your cps100e subdirectory (type pwd to verify where you are). Create a lab8 subdirectory by typing mkdir lab8 and change into this subdirectory (be sure to check that you're in the lab8 subdirectory.) Now copy the files for the lab (don't forget the . when copying).

cp ~ola/cps100e/lab8/* .

You should see the files listed below (these are links to the files in case you use Netscape, and for users outside of Duke).

Be sure you're in the lab8 subdirectory, and check to see that all files are there (type ls). Then, from an xterm window (at the prompt [1] ola@teer8% or similar) compile the first version of the program by typing: make sortall. This will compile sortall.cc and link it with the library libtapestry.a. Now run the program by typing: sortall at the prompt. The output should look like the following:

select int 500 = 0.1 seconds select string 500 = 0.21 seconds select int 1000 = 0.45 seconds select string 1000 = 0.83 seconds

sortall will also create two output files: selectint.data and selectstr.data which should just have the numbers that were printed on the screen, i.e. selectint.data should look like similar to the following:

500 0.1 1000 0.45

You can check this by loading selectint.data into emacs or from the xterm window, you can type cat selectint.data to see the file.

back to lab contents


Analyzing and graphing O(n^2) sorts:

In this part of the lab you will create two data files (one for sorting ints and one for strings) for three different sorts:

There will be a total of six files, two for each sort: one for ints and one for strings. You must modify sortall.cc so that each sort is run for different sized vectors. You should time vectors of size 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000 elements.

You will need to do this in one run of your program. Also, you will need to time each sort on the same initial "random" vectors to make valid comparisons between the different sorting algorithms. Once you have finished this, you will use gnuplot to generate graphs of your data.

To add new sorts, you'll need to add new SortBench variables. For example, the variables below can be used to time insertion sort:

SortBench<INT> insertInt("insert int",InsertSort,"insertint.data"); SortBench<string> insertString("insert string",InsertSort, "insertstr.data");

The three arguments to the SortBench constructor are, respectively,

Be sure to sort an unsorted vector with each sort, and this should be the same unsorted vector in order to make valid comparisons between the sorts. To do this you'll assign the saved unsorted vectors, storeInt and storeString to aInt and aString before sorting.

You should probably try running the program for vectors of size 500 and 1000 first. It takes a while for all the sorts to run for vectors sized from 500 to 5000, you should be sure your program is generating six data files properly before doing the final timings.

Using gnuplot

To use gnuplot just type gnuplot at the UNIX prompt. Now you should see the gnuplot prompt: gnuplot> rather than your normal prompt. To plot a datafile you should type the following:

plot "selectint.data" with linespoints 1 1 This will pop up a new window with the data in the file selectint.data graphed. You can plot several data files in the same graph. For example, to plot all three files that store runs of sorting integers you can plot using the gnuplot command below. When you type the backslash \ gnuplot will prompt with > since it knows that you're continuing a command. Just press return when you're done with all three plotting instructions.

plot "insertint.data" with linespoints 1 1, \ "selectint.data" with linespoints 3 3, \ "bubbleint.data" with linespoints 5 5

If you mess up typing, you'll have to type the command again. You can abbreviate linespoints as linesp and you can abbreviate with as w. (To quit gnuplot type quit at the gnuplot prompt.)

The number-pairs 1 1 and 3 3 indicate the style of line and point to use). For example, you can add another file by using a comma after the 5 5 specifying bubblestr.data, and specifying "linespoints 2 2".

This command should generate a plot on the screen. Now to label the axes type:

gnuplot> set ylabel "time (seconds)" gnuplot> set xlabel "# elements sorted" gnuplot> replot

Finally you will create a version of the plot to print by typing

gnuplot> set terminal postscript gnuplot> set output "squareplot.ps" gnuplot> replot This creates a file called squareplot.ps which you can send to printer by typing lpr squareplot.ps. You may have to specify the printer to use, e.g., lpr -Pteerlp1 squareplot.ps. To quit gnuplot you simply type ``quit''.

You should put your observations on the these sorts in your README file (see below on what to submit.)

back to lab contents


Analyzing O(n log n) sorts:

There are several faster sorts. Most of them are discussed in Weiss and we will go over some of them in class. For this part of the assignment, make a copy of sortall.cc into sortall2.cc by typing:

cp sortall.cc sortall2.cc

In doing this part of the assignment you must analyze two sorts as described below.

First analyze the faster sorts by modifying the program sortall2 so that it sorts arrays of size 10000, 20000, ... 50,000 using the faster sorts. DO NOT USE THE SORTS FROM THE LAST PART OF THE LAB! They will take too long. You can plot the data and then include the data as part of your README file showing how long each sort takes for ints and strings. The quicksort and mergesort functions are named QuickSort and MergeSort, respectively.

back to lab contents


Re-Implementing the Pivot Code for Quicksort

(In the changes here don't call the output files storing times the same names you used in the first part of the lab --- you might change the file names to quick2.data, for example.)

Change sortall2.cc so that it uses only quicksort and only with a vector of 10,000 elements (so don't use mergesort). Recompile the program and then run it by typing sortall2 100. This will change the RandLoad function so that there are only numbers from 1 to 100, but there will be 10,000 of these numbers. You should notice a big increase in the time to sort the integers compared with your previous run of quicksort on a vector of 10,000 elements.

To "fix" this problem, you're going to write a new version of quicksort that splits the array into 3 sections: one less than the pivot, one equal to the pivot, and one greater than the pivot.

You should write a new function Pivot3. You should use Pivot as a model for this function. Pivot3 splits the array into the 3 sections. The section equal to the pivot does NOT need to be recursively sorted. The prototype for Pivot3 may be different from the function Pivot because instead of returning a pivot you'll need to return two values: one for the end of the less-than section and one for the beginning of the greater than section. A diagram that might help you develop the code is given below.

*

If the k-th element is equal to the pivot you can swap it into the end of the equal section much as the original pivot function works. If the k-th element is less than the pivot, you can first swap it into the beginning of the equal section (and bump less), then swap the new k-th element, which is now equal to the pivot, into the end of the equal section (bumping equal). You'll probably need to think about this to get it right. To initialize less and equal note that there will be one element equal to the pivot (that's the pivot element) and NO elements less than the pivot element (so less must be initialized to a location not in the range of locations being partitioned). Write a new function quicksort3 that calls Pivot3 instead of Pivot.

Check your new sorting function to make sure you haven't lost elements. Use the PrintArray function on a randomly generated vector of size 40 to make sure that your vector is sorted. Note: When quicksort is applied to small vectors (size 20 or less) it calls insertionsort, so your pivot code may not be tested unless you change the value of the constant CUTOFF to be 0.

When you've finished debugging quicksort3, run the new n log n sorts for integer vectors only using sizes 10000, 11000, 12000, 13000, 14000, 15000 using mergesort, quicksort, and quicksort3. Make sure you type sortall2 100 for your tests so that you're sorting integers constrained to be between 0 and 99.

back to lab contents


Submitting The Lab

To submit assignments you'll run the command below, but substitute your section number (1, 2, or 3) for N.
    submit100e lab8.N README sortall.cc sortall2.cc

You can enter the files in any order.

Remember that every assignment must have a README file submitted with it (please use all capital letters). Include your name, the date, and an estimate of how long you worked on the assignment in the README file. You must also include a list of names of all those people with whom you collaborated on the assignment.

In your README file you should include the output generated by your sorting programs and an explanation of why quicksort does so badly when the numbers sorted are constrained to be less than 100 and why the modified partition code improves the timings.

You also must turn in the inlab questions either by turning in the sheet during lab or by submitting the answers with your README.


Extra Credit

Use the function RadixSort, defined in radix.cc, to time how long it takes to sort vectors of integers. Use vectors of size 10,000 to 100,000 in increments of 5,000 and compare the time to quicksort (so be sure that you sort the same vectors with QuickSort and RadixSort).

Submit a README file with the timings, and turn in a hardcopy of a graph comparing the two sorts --- use gnuplot or some other plotting program, use submit108 lab8.1.xtra (substitute your section number for 1).