Interpreting tokenized code: Problem PB
Extend your tokenizer so that, instead of printing the table of tokens, it parses and “interprets” statements in the following simple language. (By “interpret”, I mean perform the action specified by each operator on the fly, on the current values of each variable, and number.)
Language C--:
<program> ::= <statement> | <program>; <statement>
<statement> ::= <assignment> | <print>
<print> ::= print <expr>
<assignment> :: set NAME = <expr>
<expr> ::= <term> | <expr> + <term> | <expr> - <term>
<term> ::= <prim> | <term> * <prim> | <term> / <prim>
<prim> ::= NAME | NUMBER | (<expr>)
This says that a <prim> is either a NAME, or a NUMBER, or a parenthesized <expr>. A <term> is either a <prim>, or a sequence of <prim>s separated by either * or /. If a <term T> is <term T1> * <prim P>, then val(T) = val(T1)*val(P). Here “val()” represents a function which computes the value of the sequence of tokens recognized as its argument. <expr>s are evaluated in a similar fashion to <term>s, and val(“(“<expr E> “)”) = val(E).
The easy way to evaluate a <program> (which means “execute” it), is to write a collection of functions, one for each “non-terminal” of the language. (The non-terminal’s names are enclosed in “<” and “>” in the definition of C-- above.) The function named “X” is supposed to read the longest sequence of tokens it can recognize as an <X>. Function “X” then returns the value of that sequence of tokens. This value can be used in computing the value of a longer sequence of tokens, which might be recognized as some different token <Y>. All these routines are mutually recursive. If one of them looks for a specific kind of character, but does not find it, the routine should report an error, giving the line number, and continue looking, until it finds what it was looking for.
The code below assumes you have written a routine “nt()” which copies the current token’s “type” to global variable Typ, and that token’s “val” (actually, an integer identifying the specific member of this token’s “type”: The specific operator of hash table index for a specific NAME, etc) into Tvl.
The code follows a convention that the first token of the thing being “recognized” is already “displayed” in Typ and Tvl. Whenever the code “processes” a specific token, it saves information about that token in LOCAL variables, and then “scans the token away” by calling nt(). You have to start the whole recognition process by calling nt(), before calling the routine to recognize <statements>.
The routine for <term> is typical:
int term() {
int p, t, op;
p=prim();
while (Typ==MULTOP) {
op = Tvl;
nt();
t = prim();
if (op == ‘*’) p *= t;
else p /= t; (You probably should check for t==0 first.)
}
(Typ and Tvl will remain unchanged, describing this token again on next call)
return p;
}
The routine for <expr> is very similar. For <assignment>, look for a NAME (error if none found), an “=”, and call expr().
Set the value of the proper variable to the value returned by <expr>, and return. For <print>, print the value of the <expr> as an integer, followed by a newline. You can replace the sequence of if-tests in <term> by a switch, if you don’t think that is confusing (it is faster, but speed isn’t very important here).