A Lexical Analyzer Designed for Efficient Fiber Execution
while (FR.D(I<N)) {/*
Unroll this loop */J(3,3);
lch = In[I++]; /*1, 2, 3 */
J(5,5);
gst = St[lst = gst][lch]; /* 4, 5, 6, 7, 8 */
I(4,4);
switch ( FR.D(Ac[lst][lch])) { /* 9, A, B, C */
case 0:
J(1,1);/*
...
}/*
end switch */}/*
end while */
/*
Variables whose name begins with "l" are local to each Fiber. The variables I, and gst are global, as are the tables St and Ac. When fibers execute in parallel, the statement
J(5,5);
gst = St[lst = gst][lch];
must execute atomically. Once the pipeline fills, the following pattern of execution would keep 2 functional units busy, while never executing any of the instructions 4-8 at the same time in both fibers:
Fiber a: 1 2 3 4 5 6 7 8 9 A B C 1 2 3 4 5 6 7 8 9 A B C
Fiber b: ----------1 2 3 4 5 6 7 8 9 A B C 1 2 3 4 5 6 7 8 9 A B C
The delay in the execution of instructions 4-8 of the "next" fiber until instruction 8 of the "previous" fiber completes is enforceable by the hardware "dependence" mechanism, since instructions 1-C of fiber a are all issued before any instructions from the first iteration of fiber b. I've assumed here that fiber a's second iteration occurs as soon as fiber b's first iteration ends.
This pattern would not change if more fibers were executed -- the later iterations of fibers a and b could be moved to new fibers, while the FUs remain busy.
*/
Below I give the fetch and execution pattern for 2 iterations of each of 3 fibers. F represents a cycle during which 2 instructions of this fiber are fetched, while the hexadecimal digits 1 through C stand for the execution of the correspondingly numbered instructions of the loop above. Note that just accidentally, instructions 4-8 of the fibers execute "sequentially", with instructions of that sequence in one fiber never executing at the same time as instructions from this "atomic" sequence in another.
FFFFFF123456789ABCFFFFFF123456789ABC
------FFFFFF123456789ABCFFFFFF123456789ABC
------------FFFFFF123456789ABCFFFFFF123456789ABC