Geoff Sutcliffe's Overview of

# Automated Theorem Proving

## What is Automated Theorem Proving?

Automated Theorem Proving (ATP) deals with the development of computer programs that show that some statement (the conjecture) is a logical consequence of a set of statements (the axioms and hypotheses). ATP systems are used in a wide variety of domains. For examples, a mathematician might prove the conjecture that groups of order two are commutative, from the axioms of group theory; a management consultant might formulate axioms that describe how organizations grow and interact, and from those axioms prove that organizational death rates decrease with age; a hardware developer might validate the design of a circuit by proving a conjecture that describes a circuit's performance, given axioms that describe the circuit itself; or a frustrated teenager might formulate the jumbled faces of a Rubik's cube as a conjecture and prove, from axioms that describe legal changes to the cube's configuration, that the cube can be rearranged to the solution state. All of these are tasks that can be performed by an ATP system, given an appropriate formulation of the problem as axioms, hypotheses, and a conjecture.

The language in which the conjecture, hypotheses, and axioms (generically known as formulae) are written is a logic, often classical 1st order logic, but possibly a non-classical logic and possibly a higher order logic. These languages allow a precise formal statement of the necessary information, which can then be manipulated by an ATP system. This formality is the underlying strength of ATP: there is no ambiguity in the statement of the problem, as is often the case when using a natural language such as English. Users have to describe the problem at hand precisely and accurately, and this process in itself can lead to a clearer understanding of the problem domain. This in turn allows the user to formulate their problem appropriately for submission to an ATP system.

The proofs produced by ATP systems describe how and why the conjecture follows from the axioms and hypotheses, in a manner that can be understood and agreed upon by everyone, even other computer programs. The proof output may not only be a convincing argument that the conjecture is a logical consequence of the axioms and hypotheses, it often also describes a process that may be implemented to solve some problem. For example, in the Rubik's cube example mentioned above, the proof would describe the sequence of moves that need to be made in order to solve the puzzle.

ATP systems are enormously powerful computer programs, capable of solving immensely difficult problems. Because of this extreme capability, their application and operation sometimes needs to be guided by an expert in the domain of application, in order to solve problems in a reasonable amount of time. Thus ATP systems, despite the name, are often used by domain experts in an interactive way. The interaction may be at a very detailed level, where the user guides the inferences made by the system, or at a much higher level where the user determines intermediate lemmas to be proved on the way to the proof of a conjecture. There is often a synergetic relationship between ATP system users and the systems themselves:

• The system needs a precise description of the problem written in some logical form,
• the user is forced to think carefully about the problem in order to produce an appropriate formulation and hence acquires a deeper understanding of the problem,
• the system attempts to solve the problem,
• if successful the proof is a useful output,
• if unsuccessful the user can provide guidance, or try to prove some intermediate result, or examine the formulae to ensure that the problem is correctly described,
• and so the process iterates.
ATP is thus a technology very suited to situations where a clear thinking domain expert can interact with a powerful tool, to solve interesting and deep problems. Potential ATP users need not be concerned that they need to write an ATP system themselves;  there are many ATP systems readily available for use. At the 1st order level some well known and successful systems are Otter, E, SPASS, Vampire, and Waldmeister. Higher order systems include ACL2, Coq, HOL, and Nqthm. For examples of problems written in classical 1st order logic, the TPTP problem library is a useful resource, which also has a simple interface for users to try out many 1st order ATP systems.

Fields where ATP has been successfully used include logic, mathematics, computer science, engineering, and social science; some outstanding successes are described below. There are potentially many more fields where ATP could be used, including biological sciences, medicine, commerce, etc. The technology is waiting for users from such fields.

## What has Automated Theorem Proving been Really Useful for?

Many significant problems have been, and continue to be, solved using ATP. The fields where the most notable successes have been achieved are mathematics, software creation and verification, and hardware verification, and knowledge based systems.

The most exciting recent success in mathematics has been the settling of the Robbins problem by the ATP system EQP. In 1933 Herbert Robbins conjectured that a particular group of axioms form a basis for Boolean algebra, but neither he nor anyone else (until the solution by EQP) could prove this. The proof that confirms that Robbins' axioms are a basis for Boolean algebra was found October 10, 1996, after about 8 days of search by EQP, on an RS/6000 processor. This result was reported in the New York Times.

In the mathematical domain of quasi-groups there have been several successes using ATP systems. Otter has been used by the mathematician Ken Kunen to prove several results in quasi-groups. Fujita, Slaney, and Bennett (the first two being ATP researchers, the last a mathematician) decided many quasi-group problems using a system built at ICOT in Japan. William McCune and colleagues have used Otter to find minimal axiom sets, find new single axioms, and solve other interesting problems, in a range of algebraic structures. Recently the ATP system Waldmeister has been embedded into Mathmatica, extending Mathematica's already uniquely powerful algebraic theorem-proving capabilities. ATP is being used at Charles University to cross-verify the Mizar Mathematical Library. In the higher order setting NUPRL helped to confirm Higman's lemma and Gerard's paradox, both of which were under active investigation by humans at the time. The specialist geometry prover Geometry Expert, of Chou, Gao, and Zhang, has been used to obtain new results in Euclidean geometry.

Software creation is an economically important real world application of ATP. Although the use of ATP in software creation is in its infancy, there have already been some interesting results. The KIDS system developed at Kestrel Institute has been used to derive scheduling algorithms that have outperformed currently used algorithms. KIDS provides intuitive, high level operations for transformational development of programs from specifications. The Amphion project, sponsored by NASA, is used to determine appropriate subroutines to be combined to produce programs for satellite guidance. By encapsulating usable functionality in software components (e.g., subroutines, object classes), and then reusing those components, Amphion can develop software of greater functionality in less time than human programmers, with some assurance that the overall system is correct because it is built up from trusted components The Certifiable Synthesis project at NASA goes even further, generating code with annotations that can be used to extract ATP problems whose solution ensures various safety properties of the code.

Software verification is an obvious and attractive goal, which is slowly being realized using ATP technology. Three examples are given here, while many more are indexed at the Formal Methods page. The Karlsruhe Interactive Verifier (KIV) was designed as an experimental platform for interactive program verification at the University of Karlsruhe, and has since been used to successfully verify a range of software applications. These include some case studies of academic software, e.g., implementation od set functions, tree and graph representation and manipulation, and verification of a Prolog to WAM compiler. The KeY Project in Europe integrates formal software specification and verification into the industrial software engineering processes. The starting point is a commercial CASE tool, which is augmented by capabilities for formal specification and verification. The ultimate goal is to make the verification process transparent for the user with respect to the informal object-oriented model. PVS is a verification system that has been used in various applications, including diagnosis and scheduling algorithms for fault tolerant architectures, and requirements specification for portions of the space shuttle flight control system. Escher Techologies is a company that researches, develops and delivers tools for the efficient construction of provably correct software for mission-critical, business-critical or safety-critical systems.

Hardware verification is the largest industrial application of ATP. IBM, Intel, and Motorola are among the companies that employ ATP technology for verification. A few good examples of the use of ATP systems for hardware verification are listed here, while many more are indexed at the Formal Methods page. The ACL2 system has been used to obtain a proof of the correctness of the floating point divide code for AMD's PENTIUM-like AMD5K86 microprocessor, while ANALYTICA has been used to verify a division circuit that implements the floating point standard of IEEE. PVS (mentioned above in the context of software verification) has been used to verify a microprocessor for aircraft flight control. The RRL system has verified commercial size adder and multiplier circuits. The HOL system has been used at Bell Laboratories for hardware verification. Nqthm has been used to produce a mechanical proof of correctness of the FM9001 microprocessor. Jasper Design Automation, Inc is a company with a mission of making full formal IC verification a competitive advantage for its customers. The company’s flagship product, JasperGold Verification System, is the first verification product to deliver systematic complete verification, and accomplishes this task within predictable, finite schedule constraints.