CPS 206 Fall 1999, Problem 3. Work in teams.
Estimating the Performance of a Fiber Program
A fiber program executing on a processor with F "functional units" can conceivably execute F instructions on every cycle. However, achieving this speed is difficult. The purpose of the system of methods described here is to estimate how closely this goal is achieved in an actual execution of the program.
Execution of the program proceeds as follows:
This allows up to F instructions, each from different fibers, to execute on each cycle. The goal of performance estimation is to predict how many total cycles execution of this program would take on the actual hardware, without detailed simulation.
Given a real program in execution, "raw data" from which the performance estimate can be computed can be derived, by "decorating" the program with method calls. A method call I(k) announces that the current fiber will execute a sequence of k instructions, ending with a branch instruction, next. (If necessary, instruction sequences that do not end with a branch can be announced, using a method call J(k), also). The programmer will, unfortunately, have to insert these method calls into the code by hand.
I suggest that you estimate the number of instructions in an expression, by the number of programmer names or constants appearing in that expression. Place the method call in a location where it will be executed once each time the sequence of instructions it describes is executed. When calls on subroutines appear, the cost of evaluating the arguments should be included in the instruction sequence preceding the call, and the call itself should be regarded as a branch. Inside the subroutine, the same sort of instruction count estimates will have to appear, also.
The raw data derived from these method calls consists of a sequence of pairs (K, F), where K is the number of instructions in one "block", and F is the number of the fiber executing these instructions. (To indicate a block that does not end with a branch, each pair can become a triple, with the third component of each triple indicating what sort of instruction block this one is.) The performance estimation program should accept any sequence of such objects, and perform a cycle-by-cycle simulation of the instruction execution cycle. A possible design for the performance estimator would cause it to step through the sequence of instruction-block descriptors, simulating one execution cycle at a time. Simulation of one cycle consists of determining how many instructions, IC, can be fetched on this cycle from the current instruction block, and storing an object IB representing IC instruction fetched from fiber F on cycle C. L cycles after the instructions in IB are fetched, they become "ready", and enter a pool of instructions that fiber F can execute. During the current cycle, the simulator can simulate execution of one instruction from each of up to F fibers, by reducing the count of ready instructions for each of those fibers by one.
If an instruction-block is described as ending with a branch, a new instruction block for the same fiber cannot be fetched until that branch is processed. In terms of the simulation, this means until no instructions remain unprocessed (either ready, or waiting), for this fiber. When an instruction-block is described as not ending with a branch, instruction fetching can proceed with no delay. In fact, if it is legal to fetch the next instruction-block, you can simulate fetching some instructions from that block (which may come from a different fiber) on the current cycle (if fewer than F instructions were fetched from the current instruction block). Here, the hardware being simulated is assumed to "buffer" ready the next sequence of ready instructions from each fiber in special hardware registers, so those instructions are immediately available to be included in a new IC object.
In a real program, some branch instructions, like subroutine calls, and the branch that "closes" a while loop, are perfectly predictable. You can describe a block of instructions that ends with one of these sorts of branches as "not" ending with a branch. Yet another type of instruction block descriptor might be used to describe a block of instructions that ends with a "to be predicted" branch. Simulating branch prediction is beyond the scope of the basic performance estimator program, but may be done for extra credit.
Design and build the performance estimator, including the methods I() and J(), as an addition to the Java classes supplied in ~raw/public/cps206/. Augment the Qs() method there to include calls on your I() and J(), and submit (a) an explanation of the workings of your performance estimator, and (b) a report of the number of cycles used for sorting all sequences of 5 integers requires, when parameters F=4, L=8, and you use P=10 fibers. Submit your Java files, and these reports (in straight text, please) by the due date.
Class discussion will decide how best to supply the simulation parameters to the simulation.