Arithmetica
Expression Trees are trees in which the terminal nodes (leaves) represent variables or constants in an expression and the non-terminal (internal) nodes represent the operators. The inherent hierarchical nature of trees precisely capture operator precedence and provide a convenient mechanism for storing, traversing, and transforming expressions. Note, that unlike binary trees, each node of an Expression Trees can have any number of children, easily modeling operators with one, two, or even arbitrary numbers of arguments.
Expression Trees were first introduced in 1967 with the programming language LISP, in which the program itself is stored as an expression tree, called an s-expression, that can be modifed even while the program is running! Since then, expression trees have formed the heart of many applications: representing genes, regular expressions, language grammars, and queries. They are so widespread, that the World Wide Web Consortium (W3C) decided to use them as the core of the XML Document Object Model (DOM) specification, more than forty years after their invention.
The trickiest part about using expression trees is actually creating them from a given formatted input string. Parsing is the process of taking string organized by a grammatical structure (that defines key characters to act as separators between the characters that are important, e.g., whitespace or semi-colon characters in Java) and producing a list of tokens in an order that is easy to convert into an expression tree. During the parsing process, the important sequences of characters are converted into token objects, while the separators are typically thrown out. If the input string contains grammatical errors such that it cannot be parsed, then an error is reported.
Specification
In pairs, write an arithmetic interpreter that allows users to enter, change, combine, and evaluate integer arithmetic expressions interactively (such as that provided by Google). A user of your program should be able to enter expressions interactively while your program keeps track of and evaluates them. The user can enter expressions such as those described below. Your program should save each expression entered as an Expression Tree so that it can be evaluated, combined, transformed, or printed whenever needed without requiring you to re-parse the user's input.
Each time the user enters a complete expression by pressing return, it should be evaluated and the value displayed for the user, then the user should be prompted to enter another. This is the so-called read-eval-print loop used in Lisp, Smalltalk, Python and other interpreted programming languages. The user can signal the end of their session by inputting a only period, ".", on the line by itself. The interpreter should ignore spaces, lines of input that are empty, i.e., contain only zero or more spaces, and lines that begin with the comment character "#".
An example session might look like the following (the user's input is in bold). Note, especially how the values of variables are determined and updated:
Welcome to Arithmetica 1-> 3 + 7 * 2 17 2-> ( 3 + 7 ) * 2 20 3-> a = 2 * 4 * 2 16 4-> b = 16 16 5-> c = a - b 0 6-> b = 2 * 16 32 7-> c -16 8-> a = ~ b -32 9-> c -64 10-> z 0 11-> .
Expressions
Your program should recognize the following expressions:| expression |
|
semantics |
|
|---|---|---|---|
| Constant |
[0-9]+ |
a non-negative integer constant |
87
113 0 |
| Variable |
[a-zA-z_]+ |
an expression represented by a word that is not otherwise used as an operator or function |
a
bugs total_squished |
| Assignment |
|
assigns an expression tree to a variable |
a = 87
bugs = 12 * a a = b |
| Unary Operator |
|
prefixes an expression
~ // negation |
~ 13
~ ( a + b * c ) ~ bugs |
| Binary Operator |
|
combines two expressions into a single expression (in precedence order)
^ // power |
a + b
( 3 + a ) ^ 2 a % 2 a + 10 * c |
| Unary Functions |
|
a function that takes an expression as its single argument
abs // absolute value rand // random value between [0 .. num) |
abs ( a * b )
rand ( x ) - y * 2 |
| Multi-Argument Functions |
|
a function that takes one or more expressions as arguments (each separated
by a comma)
sum max |
sum ( a , 1 , b + 1 )
max ( 1 , 2 , 3 , 4 , 5 ) sum ( a , max ( b , c ) ) |
| Parentheses |
|
raises an expression's precedence |
( a + b ) * 3
abs ( 3 * ( bugs + t ) ) |
Operators have the following precedence from left to right (listed from highest to lowest):
| () | parentheses |
| ~ | unary operators |
| ^ | arithmetic operators |
| *, /, % | arithmetic operators |
| +, - | arithmetic operators |
| = | assignment |
Refactoring
An incomplete Java solution is provided that correctly implements binary operators between constant values. It also attempts to implement parentheses, but there is one bug in that part of the code. This solution uses a standard algorithm to convert an infix expression into a postfix expression, which is then easy to convert into an Expression Tree.
Refactor the code such that the program's main structures (i.e., trees, parsing, evaluating) are a closed core that is open to extensions (i.e., different syntax and sematics) without being modified. In order to do this, find the program's key abstractions and build your core only from these abstractions. To proceed, I strongly suggest you build in small steps, practice testing and refactoring as you go, and add features one at a time focusing on how to simplify the current code when implementing new features instead of simply adding more special cases to support each new idea.
Extensions
There are many extensions to the basic specifications possible; some are listed below (note, you can add more operators to the set listed above, but that will not be worth as much extra credit (but may lead to more interesting images)). From the stand point of your grade, the most important thing is that your program is designed well (i.e., there is a clear separation between the syntax and semantics of the expression language and that it is clearly possible to change either by adding only O(1) line to your existing code). This means your design should be open to adding new kinds of expressions while closed to changing the evaluation and parsing code. The requirements above and suggestions below are intended to help you to realize such a design.
- recognize expressions without spaces between key tokens
- use Java's BigInteger class instead of
intto represent the result of an expression - allow postfix unary operators, like ! to represent factorial
- allow users to print the previous expression using the no-argument function
print()(note, this should not change the previous expression, so another call toprint()would produce the same results) - allow users to refer to the previous expression within another expression using the variable name
ans(note, since this variable would be updated with each input, this can lead to strange results) - allow users to refer to history of expressions in this session when making new expressions by entering their history number prefixed by "!" (i.e.,
!12refers to the twelfth expression entered, or!-1refers to the previous expression entered) - allow users to use the evaluated value of a variable, rather than its expression tree, in an expression by prefixing it with "$" (i.e., the expression,
3 + $ans, would evaluate the previous expression as part of the expression, creating an expression tree of just three nodes) - allow spaces in variable names
- prune the expression trees based on useless arithmetic branches (e.g., multiplication by 1 or 0)
- prune sub-expressions that do not depend on variables by replacing them with a single value
- recognize identical sub-expressions to avoid recalculating them when necessary
Note that the amount of extra credit will be in proportion to how clearly it follows the goals of your design. Contorting your main design to add in extra functionality or adding poor code that diminishes the rest of your refactoring efforts will not be rewarded. Thus, a well-tested, perfectly working program that has fewer features (but plenty of clear paths to easy expansion) is always worth more than the leaky kitchen sink. In short, to maximize your grade, you should implement enough variety in your program to clearly demonstrate that your design supports further such extensions.
The name arithmetica was suggested by former Duke Professor Michael Littman as a pun on the popular mathematics interpreter Mathematica.