CS 221 / ECE 259
Advanced Computer Architecture II

Spring 2008
Alvin R. Lebeck

 

Objective:

The goal of this course is to study the design and programming of parallel computers.  Topics explored include: programming models, memory systems (coherence and consistency), interconnection networks, and evaluation.

Prerequisites:

CS 221, ECE 252, or consent of instructor.  It is assumed that you are familiar with the material covered in that course.  Please see the instructor if you have any questions about your background.

Location:

D243 LSRC Monday & Wednesday 10:05 to 11:20

Instructor:

Professor Alvin R. Lebeck
Office: D308 LSRC
Office Hours: Tues & Wed 3:00 to 4:00
Email: alvy (you figure out the rest)

Schedule is here

Grading:

This course is more like a seminar course than a traditional lecture course.  Student involvement is an integral part of the course, thus reading and presenting research in both written and oral form is expected from the students.  There is very little busy work.  We will read a lot of papers, if you have difficulty reading, writing, or speaking you will be expected to work hard to improve these skills.

Grades are based on:

·         Leading discussion in class: 15%

·         Class participation during discussions: 15%  (if you aren’t in class you aren’t participating)

·         Programming assignments: 10%

·         Final Exam: 20% (here Due April 29, 2pm)

·         Individual or Group project: 40% (ideas here Duke NetID authentication required)

Deadlines will be enforced except in very exceptional circumstances.  It is better to submit something nearly finished than wait and submit late.

Academic Misconduct:  I have a zero tolerance policy for academic misconduct, this includes cheating on the exam and plagiarism on the project.  If you have not worked on research projects of the type expected in this course, then you need to particularly careful about citing previous work and crediting others’ research.

Programming assignments:

Your job for this assignment is to implement a non-trivial algorithm using a variety of programming models: several for shared memory and one message passing.  The shared memory programming models include pthreads and MPI.

Programming #1 Due Jan 30

Implement parallel matrix multiplication using pthreads on the shared memory multiprocessor twister.cs.duke.edu. This must be your own version, do not simply download it from somewhere. Pick a reasonable size so that you can run the program with 1, 2, 4, 8, 16 processors. Plot the speedup of the program. (Note, twister has 8 processors with two hardware contexts in each processor).

Some documentation from Dan Sorin (thanks Dan!)

On a CS research machine (i.e., twister), create a working directory, which I will assume is called my_pthreads. Copy my example code to your directory:
cp ~sorin/ece259-public/* my_pthreads/
Move to the my_pthreads directory (cd my_pthreads) and you will find the following files:
add.C // this is the main file - it adds all of the numbers in an input file
timer.h // this has a helpful routine for finding the current time
in.txt // this is a sample input file
Build the program on the machine twister.cs.duke.edu (no makefile is necessary): g++ -pthread -o add add.C
Run the program on twister.cs.duke.edu (which has 8 processors and 16 thread contexts): ./add <N> in.txt
You have just run the program on N threads using in.txt as the input file.
To find out how long it took to run, you could also use the Unix time command (which should be the same as what the program reports for itself): time add <N> in.txt

As an aside I recommend that you use the library time function to measure only the parallel portion of your program. I'll leave it to you to read about how to do this.

Programming #2 Due Jan 30

MPI version of matrix multiply. We'll run this on the CS research cluster.

This will submit your parallel MPI job to a set of 16 machines. You can check the status of your job with the command qstat (remember you need to be on nicl.cs.duke.edu), when it returns nothing your job has finished. Use qdel to remove unwanted or hung jobs from the queue (e.g., qdel -u userid).

The program output is in the file MPI_Job.o<PID> and stderr output is in MPI_Job.e<PID> The other files are generated by the batch submission tool.

To change the program and/or number of nodes you are executing on you change one of two lines in the mpi.sh file. The line with the number 16 (#$ -pe mpich 16) specifies the number of nodes to use. Simply change this to 8 to run on 8 nodes (try it with the hello program).

To change the program that you are running you change the last line of the file mpi.sh (mpirun -np $NSLOTS -machinefile $TMPDIR/machines hello). You would change hello to whatever your program name is.

Projects:

The project is an individual or group (2) semester-long project that should be of quality close to a full research paper.   The project requires:

·         Written proposal (3 pages maximum), Due Feb 27

·         Written (possibly oral also) progress report (3 pages maximum), Due March 26

·         Final written document (12 pages maximum, in conference/journal paper format)

·         Final oral presentation in class April 14 & 16