next up previous contents
Next: Mathematics Education Up: Longer-Term Opportunities Previous: Program Synthesis

Mathematics

The AD field has been concerned with finding mathematical proofs for most of its forty years of existence. (Concept formation, the heart of mathematics to mathematicians, is far more difficult to automate than proof discovery. Only small beginnings have been made on the concept formation challenge.) Although the Robbins algebra proof discovery stands out at present, we have noted earlier that several provers have been used to solve open mathematical problems. Their logical power ranges from propositional logic to higher-order logic. Some researchers in the AD field, at least in the United States, think that AD research support for mathematics now should go primarily to solving open mathematics problems. The merit in this is that a focus then is set for funding at a time when little funding seems likely in automated mathematics in the United States. Some other AD researchers strongly oppose this view. They point out that open problems have been solved in certain types of mathematics only: discrete mathematics, logic and algebraic systems primarily. In particular, the area of continuous mathematics has had no new theorem proved by fully automated means. Funding only systems that have solved some open problem may halt progress in addressing continuous mathematics, for example. A way of assessing quality without the mandate to solve open problems is by performance comparison with other provers having the same goals. One means for doing this occurred in 1996 in the form of a competition among first-order logic provers in several categories. The ``TPTP competition'', named for the Thousands of Problems for Theorem Provers database containing at least 3000 problems with full axiom sets, was held at the 1996 Conference on Automated Deduction and enjoyed admirable planning, publicity (within the field), and contest controls [101] [100]. Such competitions take great resources to assemble and run; much credit goes to Geoff Sutcliffe (James Cook University, Australia) and Christian Suttner (TUM, Munich, Germany). It is important that this competition thrive and be recognized for the value it adds to the development of our field. It needs to broaden to include different types of provers, such as higher-order logic provers and induction provers, although efforts may be under way already to extend the competition.

The value of such a competition is shown by the surprise win of E-SETHEO, a Connection Tableaux and Model Elimination-based theorem prover from Munich [78], in one major category of the 1996 competition.[*] That E-SETHEO fared so well against Otter, still the standard to measure uniform first-order provers, establishes that E-SETHEO warrants further investment and development. (By ``uniform'' prover we mean one does not have specialized procedures for selected domains. An exception is equality, which is often taken as a primitive in traditional first-order logics.) (We remark that Otter also won in one category [73].) In another category the German system SPASS (Max Plank I., Saarbrücken) was the winner [104]. This strong performance of two provers from Germany speaks well for the capabilities developing in Europe in this field. This systems capability, and strong conceptual capability as well--measured by accepted papers at the Conference on Automated Deduction--is not a surprise to members of the research field. The German deduction community won a six-year award of roughly $15 million (U.S.) in 1992 for basic research directed towards applications in programming (the Schwerpunkt Deduktion project). This award allowed expansion of an already strong research base. The recent emergence of applications for AD systems and formal methods suggests their timing for increased support of automated deduction may be excellent.

The problem of proving theorems in the continuous mathematics domain was mentioned above, and we comment briefly here on this area. Many important theorems concerning continuous functions, for example, can be proven as logical consequences of other theorems. Definitions such as continuity for functions have reasonable logical encodings. However, in this domain most proofs, even for theorems of modest difficulty, demand substantial reasoning capabilities. Particularly needed are enhanced automated reasoning capabilities in set theory, or in the closely related type theories of higher-order logics. Good first steps have been made in first-order logic [11] and higher-order logic [85], [87], [2]. (The first two papers on higher-order logic systems consider the interactive system Isabelle, while the last paper discusses the system TPS directed toward automatic proofs.) As good as this work is, the accomplishments do not come close to providing the reasoning capabilities needed. In particular, the combinatorics of possible instantiations of variables to values explode when dealing with sets or higher-order formulas (which can be values for higher-order variables). In another direction, reasoning that is routine to humans regarding the real numbers is beyond our reach at present, with a few striking exceptions. One notable example was work done by Bledsoe and Hines in the development of a variable elimination method for inequality formulas and a manner of chaining together inequalities [12], which they incorporated in the system STR+VE [47]. The resultant theorem prover works in the first-order logic setting, using resolution, but replaces resolution proofs handling inequalities with a system as capable in most respects as a human in simplifying systems of inequalities. Much more work of this type is needed and should be supported, but such results are difficult to develop. It seems clear that interactive and semi-automatic AD systems, most likely using higher-order logic formulations, will provide useful mathematical assistance; but new approaches or even major breakthroughs are needed before fully automated AD systems can tackle difficult or open problems in continuous mathematics.

One single step that will strengthen current theorem provers considerably, whether interactive or fully automated, is the incorporation of computer algebra systems. This does not benefit provers uniformly, of course, but only those for which algebraic manipulation is part of the proof task. Still, within the pertinent domains the effect will be dramatic. Clarke and Zhao demonstrated the power of an enhanced algebraic simplifier coupled with AD proving techniques by proving fully automatically a number of summation equations of Ramanujan [25]. The theorem provers were Analytica and a successor prover. The Ramanujan results are beyond the capabilities of other present provers. (Other interesting work, involving series summation using a proof planning method and ``rippling'', is considered below.)

The combination of computer algebra and theorem-proving technologies also can be used to overcome present shortcomings of the computer algebra systems. These algebraic manipulators are unsound by such faults as dividing by zero without acknowledging the fact or not checking termination conditions. (It is not as simple as checking for a zero, of course; some systems execute symbolic computations, and then the condition must be retained explicitly.)

The automation of mathematical induction remains a central problem, as noted several times. In particular, we need to better handle the choosing of induction hypotheses and induction variable. Specialized cases continue to be identified and custom-handled, and induction systems continue to improve in performance, but we lack general techniques. We cannot expect global, powerful techniques that match in flavor the universal enumeration proof procedures that exist for first-order logic. The best we can hope for is techniques that handle certain problem classes. One such promising insight, called rippling [18], originated in work by Bundy and colleagues at Edinburgh. It is a syntactic strategy that takes advantage of similarities between the induction step consequent and the induction hypothesis and executes a monotonic series of local manipulation steps to change one to the other. It has shown surprising robustness, even finding use in proof planning [103]. But too few paradigms like rippling are known at present. Support of research on induction is important if broader principles are to be discovered and developed. Like the exploitation of computer algebra and AD as discussed above, the improvement of induction automation will have strong effects on verification work as well as general mathematics.

Part of the environment of proof discovery is counterexample discovery, the finding of an example that refutes a conjecture. This is receiving attention with model generation schemes and procedures. counterexample discovery is not new to AD technology, having been undertaken at Argonne with the predecessor of Otter, for example [105]. Recently, this task has received more attention, from use of the Davis-Putnam procedure for satisfiability checks to model generation use in the quasigroup work [94] [40]. Efforts are needed to better incorporate counterexample production with proof search, where the counterexample is used to truncate a false probe in the proof search.

An apparent opportunity that we have never been able to exploit to the degree imagined in the early days of AD research is the use of semantics to guide the proof search. It was used elegantly in the Geometry Theorem Proving Machine, where the geometry diagram was used to eliminate many irrelevant search candidates [43]. It quickly became clear that there were few problem domains that had a problem representation language at the same granularity as the proof language. The use of problem semantics on a broader scale now seems to occur primarily in the use of decision procedures for various subdomains. Decision procedures are being incorporated in provers, especially in verification provers. We should continue to address the production of decision procedures on whatever problem domains we deem of some interest. They will not be attention-getters, such as Presburger arithmetic or Tarski's real closed fields procedure, but simply utilitarian for a job on hand. They then should be shared with other developers in the sense of a common Web site where such procedures and the domain they encompass can be announced. Like heuristics for induction variable choice, on such mundane items is much progress gradually made.

The Robbins algebra accomplishment is reinforcement of the value of continued research in equational reasoning. The system EQP uses associative-commutative (AC) unification, a fancy way of pattern matching where order and grouping of term occurrences do not matter [98]. This is a property of equality reasoning in particular. In the presence of variables, it is a very difficult process to control. Another feature used by EQP is a sophisticated version of paramodulation, the equational inference rule mentioned earlier. Like AC-unification, the paramodulation rule is explosive in the possibilities it generates. Extensive research, much of it arising first in the term-rewriting community, has led to the sharp trimming of possibilities that need be considered. Without the prior research that produced AC-unification and controlled the combinatorial explosion of equational inference in general, the Robbins algebra problem would not have been solved. Although the term-rewriting community is at the center of the equational reasoning efforts, and support there is important, work outside that community also takes place, as the original paramodulation work makes clear.

Because so much emphasis has been placed on directed research in this report, we belabor the point that basic, often purely theoretical, work must be supported for the field to move forward. In addition, we must not be too eager to leave the ``old'' behind, feeling that old areas have been explored and only new ideas from new areas will contribute to progress. Strong evidence of these points comes in the above paragraph. The paramodulation inference rule was introduced in the 1960s, yet it is the fairly recent work on basic paramodulation [5] that was incorporated in EQP. Also, the AC-unification work was done in the early 1970s but not used to this point as much as it now surely will be.

We mentioned that the term-rewriting community is central to the work in equational logics. This point warrants elaboration. Rewriting-based AD systems often work with equational logics using oriented equations, or rewrite rules. Although use of rewrite rules derived from equations is actually older than electronic computers, the rewrite theory and consequent practice were initiated by a paper by Knuth and Bendix [59]. This paper contains a powerful result: Given a set of equations (an equational logic) and appropriate added information, there exists a unique (under certain conditions) set of rules that define a decision procedure for the theory presented by the equations. The ``completion'' of the initial rule set to the extended decision procedure rule set often is troublesome in practice; but even when completion is impractical, the rules that are generated are powerful additions to the original set. Much significant work, such as the basic paramodulation results mentioned above, has followed over the years from this beginning. Many AD systems have been developed around these ideas directly; the Rewrite Rule Laboratory (RRL) system mentioned earlier is one example. Such AD systems rely extensively on rewriting as a deduction mechanism and also support sophisticated unification methods (e.g., AC-unification) for completion (McCune's EQP is one such system). Approaches for reasoning by induction and semantics of data structures using decision procedures have also been integrated, leading to powerful AD systems that can be also used for applications, including analysis of equational specifications and implementations. As an example, the latest version of RRL has been used by Kapur and M. Subramaniam to verify properties of the SRT division circuit as specified by Clarke et al. This work was done completely automatically and without a computer algebra system as was used by Analytica, Clarke and Zhao's system [52]. Many of the term-rewriting systems are being developed in Europe and are being used for tasks such as circuit verification as well as for mathematics. See [41] for sample systems descriptions, as well as recent papers on rewriting techniques. This line of research is important and definitely needs sustained support.


next up previous contents
Next: Mathematics Education Up: Longer-Term Opportunities Previous: Program Synthesis
Donald Loveland
12/29/1997