The goals of this lab include doing the specific tasks outlined below
and understanding general concepts behind the tasks.
This lab assignment is mainly an empirical study of different sorting
algorithms whose implementations are given to you. Most of this lab
involves analyzing the algorithms on different kinds of input and
making conclusions supported by the evidence you glean from your
studies.
Lab 8 table of contents
[ O(n log n) sorts | New Quicksort Partition | Submit | Extra Credit ]
The file sortall.cc will be used as a framework around which different sorting algorithms that will be tested in this lab. Currently sortall will do the following:
The program uses templated functions to sort. Any class/type which supports the standard comparision operations. <=, >=, ==, !=, > < can be sorted.
In this section of the lab you'll copy files and then compile and run the sortall program.
First change into your cps100e subdirectory (type pwd
to verify where you are). Create a
lab8 subdirectory by typing mkdir lab8 and change
into this subdirectory (be sure to check that you're
in the lab8 subdirectory.) Now copy the files for the lab
(don't forget the . when copying).
You should see the files listed below (these are links to the files in case you use Netscape, and for users outside of Duke).
Be sure you're in the lab8 subdirectory, and check to see that all files are there (type ls). Then, from an xterm window (at the prompt [1] ola@teer8% or similar) compile the first version of the program by typing: make sortall. This will compile sortall.cc and link it with the library libtapestry.a. Now run the program by typing: sortall at the prompt. The output should look like the following:
sortall will also create two output files: selectint.data and selectstr.data which should just have the numbers that were printed on the screen, i.e. selectint.data should look like similar to the following:
You can check this by loading selectint.data into emacs or from the xterm window, you can type cat selectint.data to see the file.
In this part of the lab you will create two data files (one for sorting ints and one for strings) for three different sorts:
There will be a total of six files, two for each sort: one for ints and one for strings. You must modify sortall.cc so that each sort is run for different sized vectors. You should time vectors of size 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000 elements.
You will need to do this in one run of your program. Also, you will need to time each sort on the same initial "random" vectors to make valid comparisons between the different sorting algorithms. Once you have finished this, you will use gnuplot to generate graphs of your data.
To add new sorts, you'll need to add new SortBench variables. For example, the variables below can be used to time insertion sort:
The three arguments to the SortBench constructor are, respectively,
Be sure to sort an unsorted vector with each sort, and this should be the same unsorted vector in order to make valid comparisons between the sorts. To do this you'll assign the saved unsorted vectors, storeInt and storeString to aInt and aString before sorting.
You should probably try running the program for vectors of size 500 and 1000 first. It takes a while for all the sorts to run for vectors sized from 500 to 5000, you should be sure your program is generating six data files properly before doing the final timings.
To use gnuplot just type gnuplot at the UNIX prompt. Now you should see the gnuplot prompt: gnuplot> rather than your normal prompt. To plot a datafile you should type the following:
If you mess up typing, you'll have to type the command again. You can abbreviate linespoints as linesp and you can abbreviate with as w. (To quit gnuplot type quit at the gnuplot prompt.)
The number-pairs 1 1 and 3 3 indicate the style of line and point to use). For example, you can add another file by using a comma after the 5 5 specifying bubblestr.data, and specifying "linespoints 2 2".
This command should generate a plot on the screen. Now to label the axes type:
Finally you will create a version of the plot to print by typing
You should put your observations on the these sorts in your README file (see below on what to submit.)
There are several faster sorts. Most of them are discussed in Weiss and we will go over some of them in class. For this part of the assignment, make a copy of sortall.cc into sortall2.cc by typing:
In doing this part of the assignment you must analyze two sorts as described below.
First analyze the faster sorts by modifying the program sortall2 so that it sorts arrays of size 10000, 20000, ... 50,000 using the faster sorts. DO NOT USE THE SORTS FROM THE LAST PART OF THE LAB! They will take too long. You can plot the data and then include the data as part of your README file showing how long each sort takes for ints and strings. The quicksort and mergesort functions are named QuickSort and MergeSort, respectively.
(In the changes here don't call the output files storing times the same names you used in the first part of the lab --- you might change the file names to quick2.data, for example.)
Change sortall2.cc so that it uses only quicksort and only with a vector of 10,000 elements (so don't use mergesort). Recompile the program and then run it by typing sortall2 100. This will change the RandLoad function so that there are only numbers from 1 to 100, but there will be 10,000 of these numbers. You should notice a big increase in the time to sort the integers compared with your previous run of quicksort on a vector of 10,000 elements.
To "fix" this problem, you're going to write a new version of quicksort that splits the array into 3 sections: one less than the pivot, one equal to the pivot, and one greater than the pivot.
You should write a new function Pivot3. You should use Pivot as a model for this function. Pivot3 splits the array into the 3 sections. The section equal to the pivot does NOT need to be recursively sorted. The prototype for Pivot3 may be different from the function Pivot because instead of returning a pivot you'll need to return two values: one for the end of the less-than section and one for the beginning of the greater than section. A diagram that might help you develop the code is given below.
If the k-th element is equal to the pivot you can swap it into the end of the equal section much as the original pivot function works. If the k-th element is less than the pivot, you can first swap it into the beginning of the equal section (and bump less), then swap the new k-th element, which is now equal to the pivot, into the end of the equal section (bumping equal). You'll probably need to think about this to get it right. To initialize less and equal note that there will be one element equal to the pivot (that's the pivot element) and NO elements less than the pivot element (so less must be initialized to a location not in the range of locations being partitioned). Write a new function quicksort3 that calls Pivot3 instead of Pivot.
Check your new sorting function to make sure you haven't lost elements. Use the PrintArray function on a randomly generated vector of size 40 to make sure that your vector is sorted. Note: When quicksort is applied to small vectors (size 20 or less) it calls insertionsort, so your pivot code may not be tested unless you change the value of the constant CUTOFF to be 0.
When you've finished debugging quicksort3, run the new n log n sorts for integer vectors only using sizes 10000, 11000, 12000, 13000, 14000, 15000 using mergesort, quicksort, and quicksort3. Make sure you type sortall2 100 for your tests so that you're sorting integers constrained to be between 0 and 99.
submit100e lab8.N README sortall.cc sortall2.cc
You can enter the files in any order.
Remember that every assignment must have a README file submitted with it (please use all capital letters). Include your name, the date, and an estimate of how long you worked on the assignment in the README file. You must also include a list of names of all those people with whom you collaborated on the assignment.
In your README file you should include the output generated by your sorting programs and an explanation of why quicksort does so badly when the numbers sorted are constrained to be less than 100 and why the modified partition code improves the timings.
You also must turn in the inlab questions either by turning in the sheet during lab or by submitting the answers with your README.
Submit a README file with the timings, and turn in a hardcopy of a graph comparing the two sorts --- use gnuplot or some other plotting program, use submit108 lab8.1.xtra (substitute your section number for 1).