CPS 104 Problem P6
MYMIPS: My version of MIPS
This problem requires your team to write an "interpreter", in C, for a new version of the MIPS machine which I have designed. I call the machine MYMIPS. An interpreter is a computer program which reads the instructions of some machine, and then "obeys" each instruction the way the real machine would. This is similar to what SPIM does.
The MYMIPS machine allows very few instruction types. Each instruction is composed of several small integers, whose binary representation is concatenated together, to form a single integer. These integers are stored in the array MEMORY in which programs live, and are selected one at a time from that array by the interpreter. The interpreter decodes each instruction, and performs the action specified by that instruction. The interpreter maintains a special register, called PC, which gives the index in MEMORY of the instruction currently being executed. MEMORY is byte-addressed, so successive words of MEMORY are indexed by successive multiples of 4. The action performed by an instruction can involve changing one or more registers of the simulated MYMIPS CPU, or one element of MEMORY. One field of each instruction specifies the instruction's type (the "Op-code"). The remaining fields, named a, b, d, imm, and i in the definitions below, are "parameters" of the instruction, and may hold the numbers of one or more registers on which the instruction operates, extensions of the op-code, or small numerical constants, which can be combined with a register's contents to assist in the computation. The registers occupy array R, which is separate from MEMORY.
Just as in MIPS, register 0 will always hold 0, and your interpreter should enforce this in some way, possibly by setting R[0]=0 before simulating each instruction.
Here are the instruction types and definitions:
|
Op- code |
Name |
Definition |
|
|
0 |
syscall |
Action(eu()) |
See table |
|
1 |
Load |
R[d] = MEMORY[e() + R[a]] (word); PC=PC+4 |
|
|
2 |
Load Byte |
R[d] = MEMORY[e() + R[a]] (byte); PC=PC+4 |
|
|
3 |
Store |
MEMORY[e() + R[a]] = R[d] (word); PC=PC+4 |
|
|
4 |
Store Byte |
MEMORY[e() + R[a]] = R[d] (byte); PC=PC+4 |
|
|
5 |
Add |
R[d] = R[a] + e(); PC=PC+4 |
|
|
6 |
Subtract |
R[d] = R[a] - e(); PC=PC+4 |
|
|
7 |
Shift |
R[d] = R[a] << e(); PC=PC+4 |
See below |
|
8 |
And |
R[d] = R[a] & eu(); PC=PC+4 |
|
|
9 |
Or |
R[d] = R[a] | eu(); PC=PC+4 |
|
|
10 |
Xor |
R[d] = R[a] ^ eu(); PC=PC+4 |
|
|
14 |
Branch |
if (Op(d,R[a])) PC = e(); else PC=PC+4 |
|
|
15 |
Branch+link |
R[15] = PC+4; PC = e() |
|
Instructions 11 through 13 are reserved for future expansion of MYMIPS. They should be simulated by termination of program execution, with an informative message. All fields not used by an instruction should be 0, to provide for possible future expansion of this Instruction Set Architecture.
The function e() uses the fields i, b, and imm of the current instruction to compute one signed value. Its definition is:
e() = (i==1)?sign-extend(imm):R[b];
That is, if field "i" is 1, the value of e() is the contents of field "imm", sign-extended to 32 bits; otherwise, the value of e() is R[b], the contents of the register selected by the "b" field of the instruction. The function eu() is similar to e(), but extends its immediate-field operand with zeros, rather than with copies of its high-order bit:
eu() = (i==1)? imm &0x7FFFF : R[b];
The "word" load and stores move the contents of the 4 bytes at MEMORY[x..x+3] to or from the target register. The value of "x" must be a multiple of 4, or the program is terminated with a meaningful message. The bytes may, on any given MYMIPS machine, be loaded and stored in either big-endian or little-endian order.
The load byte instruction copies one unsigned byte into the target register's low-order 8 bite, and fills the rest of the register with 0. The store byte instruction stores the low-order 8 bits of the target register into the addressed byte of MEMORY.
All arithmetic, including the e() and eu() calculations, are done in 32-bit 2's complement integer binary arithmetic. MYMIPS has no floating-point arithmetic defined.
The "shift" instruction performs a right-shift, if e()<0. The right-shift sign-extends R[a].
The function Op() determines whether a branch instruction "branches" by examining the "d" field of the current instruction, and using this field's value to select a rule for computing a C "true or false" expression. Usually this expression compares x against 0, but a d value of 7 selects a "do not branch" operation, while a d value of 0 causes the instruction to "always branch".
Op(d,x) is defined by the following table:
|
d |
Op(d,x) |
|
|
0 |
1 |
(always branch) |
|
1 |
x<0 |
|
|
2 |
x==0 |
|
|
3 |
x<=0 |
|
|
4 |
x>0 |
|
|
5 |
x!=0 |
|
|
6 |
x>=0 |
|
|
7 |
0 |
(never branch = no-op) |
The instruction "syscall" has actions specified by the following table. After performing the specified action, the program continues at the instruction following the syscall:
|
eu() |
action |
|
|
0 |
break |
(used by the debugger) |
|
1 |
printf("%d",R[2]) |
R[2] holds the integer to be printed |
|
4 |
printf("%s",&MEMORY[R[2]]) |
R[2] holds the address of the string |
|
5 |
scanf("%d",&R[1]) |
R[1] will get the integer read |
|
6 |
fgets(&MEMORY[R[2]],R[3],stdin) |
R[2] holds the address of the string area, R[3] holds the length of the string area |
|
10 |
exit |
|
Syscall changes no registers, other than R[1] and R[2].
Actions corresponding to eu() values of 5 and 6 return -1 in R[2] if end-of-file was encountered, and otherwise set R[2] to 0. These actions read from stdin. Output actions print on stdout. Action 6 reads characters from stdin into MEMORY[R[2]..R[2]+R[3]-2] until R[3]-1 characters are read, or a newline character is read and transferred to MEMORY[], or an end-of-file condition is encountered. A null character ('\0') is then added after the last of these characters.
The fields of an instruction occupy the bit positions shown by the following table, where bit 31 is the highest-order bit of a 32 bit word, and bit 0 is the lowest-order bit. Each field's length in bits is also shown.
|
Field |
length |
start |
end |
|
Op-code |
4 |
31 |
28 |
|
d |
4 |
27 |
24 |
|
a |
4 |
23 |
20 |
|
i |
1 |
19 |
19 |
|
b |
4 |
3 |
0 |
|
imm |
19 |
18 |
0 |
Note that the b field and the imm field overlap. An instruction has either a b field (when i==0) or an imm field (when i==1), but never both.
When written as a hexadecimal number, the first 3 digits of the number can be interpreted easily as: Op-code, d, and a. The last 5 digits contain i and either b or imm. Thus the number 0x5238A2CC represents an instruction which can be written with spaces between its fields as: 5 2 3 1 0A2CC. This instruction has Op-code=5, d=2, a=3, i=1 and imm=0x0A2CC. According to the table, this is an Add instruction, which changes register 2 to R[3]+0x0A2CC. The number 0x52300004 is another Add instruction, with i=0. It performs the action R[2] = R[3] + R[4], since b=4 in this case.
To simplify addressing, array MEMORY will have 219 elements, and indices into MEMORY will be taken modulo 219 automatically. This rule implies that location 0 in MEMORY follows immediately after location 0x7FFFF. Furthermore, location i+0x7FFFF becomes location i-1 of MEMORY, and in general location i + 0x80000 - j of MEMORY becomes location i-j of MEMORY, because (i + 0x80000 - j) % 0x80000 = i-j. Registers hold 32 bits, however.
Write an interpreter for the MYMIPS machine, in C. This program simulates the execution of the MYMIPS machine, one instruction at a time, by fetching an instruction from MEMORY[PC], splitting the instruction into its component fields, and performing the action selected by the instruction's op-code field (using that instruction's values of parameters a, b, d, i, and imm).
Before beginning to simulate a program, the interpreter must "load" that program into memory. The interpreter expects to read input lines from stdin, each containing a single hexadecimal integer (without the usual "0x" preceding them). Each line represents the contents of the next sequential word of memory, where the first line specifies the contents of MEMORY[0]. The interpreter will load these lines into MEMORY.
Following the lines representing MEMORY words, a line containing the "END" directive will appear. This line has a different form. The END directive is a line containing 2 integers, a -1, followed by an integer "N". The value of "N" gives the the initial value the PC should have (location in memory where simulation begins). After reading the END directive, the interpreter should immediately begin simulating MYMIPS. Requests to read input should obtain data from any lines on stdin which follow the END directive, and all output should be directed to stdout.
Testing
Submit with your program a short program written in MYMIPS input language. This program should read sequences of 3 integers from stdin, and operate on them as follows:
Suppose the 3 integers read are k, i, and j. The program should print a line holding 4 decimal numbers, separated by single spaces: k, i, j, and f(k,i,j), in that order. f is defined as follows:
|
k |
f(k,i,j) |
|
0 |
i+j |
|
1 |
i-j |
|
2 |
min(i,j) |
|
3 |
max(i,j) |
|
4 |
i << j |
The operator "<<" in the table above is the MYMIPS "sla" operation.
When the value of k read is <0, or >4, the program should exit quietly. Otherwise the program should continue reading and processing 3-integer sequences as specified above.
Collaboration
Teams of two people who have not worked together as a team in this class before should collaborate on this problem, and submit one solution per team. Note that the members of a team should be familiar with all parts of the team's solution. Include a file named "mailto", holding each team member's mail address, using the team members "basic" login id on a duke machine, preferably acpub.
Assembler
I have written an assembler for MYMIPS, which may be helpful to you in writing the test program. Unfortunately, I have not been able to test it extensively, and therefore cannot guarantee that it is completely correct. You may use it if you wish, but you should be suspicious of what it does -- if your interpreter gets a funny error in trying to run a program, MAYBE the assembler has assembled an instruction incorrectly, or MAYBE the test program has an error in it, or MAYBE your interpreter is wrong. Please report errors in the assembler to me by e-mail.
I have also modified the "hex" program originally written for SPIM, so it can be assembled by mymipsasm. The assembler source for this program is called "hex.myasm", and the assembler's output is called "hex.mymips". I believe your interpreter should be able to execute hex.mymips. It requires that the input data follow the program on stdin, to run successfully. Unfortunately, even if your interpreter runs hex.mymips successfully, there is no assurance that your interpreter is completely correct. You should create other programs as test cases for it yourself.
An assembler can be helpful in writing programs, compared to writing them in hexadecimal, since the assembler can read a somewhat understandable notation like:
add $2, $3, $4
and translate it into the proper encoded form for the machine instruction the notation describes, here, 0x52300004. The most useful assembler feature allows the programmer to "label" instructions and data locations. The assembler makes two "passes" over the assembly source file. During the first pass, the assembler computes the location in MEMORY each instruction and datum will occupy; during this pass, location labels are "defined" to have values equal to the locations the item they label will occupy. During the second pass, the values of these defined labels replace each occurrence of such a label in a ".word" or instruction field. A "hash table" is usually used to associate "labels" with their values. My assembler also pre-defines names like $t0, $ra, and so on. See a table in the assembler's documentation. Note: MYMIPS only has 16 registers, not the 32 that MIPS has, so not all MIPS names are defined.
C Esoterica
Certain parts of the interpreter require special knowledge of some details of C which are not otherwise required in this course. I present some code below which you may use, or modify, in your programs.
First, your interpreter will need to maintain an array, which I call here MEMORY, to hold the instructions and data of the program being simulated. This array must be accessed both as "char" (single bytes) and "int" (integers or instructions. The hardware of a real machine has no trouble doing this, but C has to be fooled into allowing it. Here's one method:
|
char M[HUGE]; |
/* Makes referencing bytes from M simple. */ |
|
int word, adr; |
/* Suppose "adr" holds an integer multiple of 4. */ |
|
word = *((int *) &M[adr]); |
/* Forms M[adr], M[adr+1], M[adr+2], and M[adr+3] into an integer, and loads that integer into "word". */ |
|
*((int *) &M[adr]) = word; |
/* Stores the integer held in "word" into bytes M[adr], M[adr+1], M[adr+2], and M[adr+3]. |
The method above works by creating a "char" pointer, &M[adr], and "casting" that pointer into a pointer to an int, using the "cast" (int ). The initial "*" tells C to reference whatever that pointer points to. C then uses integer instructions to reference the pointed-to data.
Second, decoding the input to the interpreter requires reading "lines" from stdin, and determining if the line contains 1, or 2 hexadecimal integers (or some other wrong number of integers). This can best be done using two library routines, "fgets()" and "sscanf()". Code that should work is:
int x=1, flag, word, loc=0;
|
char buffer[256]; |
/* You can assume that each input line is no longer than 256 characters */ |
while ((x==1)&&(fgets(buffer, 256,stdin)) {
if ((x=sscanf(buffer,"%x%x", &word, &flag))==1) {
/* Got one integer on the line */
*((int *) &M[loc]) = word:
loc+=4;
}
}
if (x==2) {
/* Last line contained 2 numbers. */
PC = flag;
}
else error("input line format");
The fgets() routine returns a non-zero value if it reads a line successfully, and a zero otherwise; sscanf() returns the number of items it converted correctly.
Errors
If your interpreter detects an error condition, it should print a message using the format:
"Fatal error at PC = %06X: %s\n", and fill in the value of the program counter, and a short error message like the one above. The interpreter should then execute "exit(1);" to terminate execution. If the interpreted program completes with no errors that the MYMIPS hardware could detect, the interpreter should simply return from main(). Only output generated by the program being interpreted, and a possible single error message as above should appear in the interpreter's output.