Microsoft has made an extensive investment in the area of software model checking. Tools such as Slam, Zing, KIS, Terminator, and ESP are now being used both inside and outside of the company. In this talk I will describe some of the underlying automatic theorem proving infrastructure used by these verification tools. I will also describe some recent advances and findings in this area.

*Dr. Byron Cook is a researcher at Microsoft's research laboratory in
Cambridge, England. His research interest include automatic theorem
proving, model checking and programming language theory. *

Program certification aims to provide explicit evidence that a program meets a specified level of safety. This evidence must be independently reproducible and verifiable. We have developed a system, based on theorem proving, that generates proofs that auto-generated aerospace code adheres to a number of safety policies. For certification purposes, these proofs need to be verified by a proof checker. Here, we describe and evaluate a semantic derivation verification approach to proof checking. The evaluation is based on 109 safety obligations that are attempted by EP and SPASS. Our system is able to verify 129 out of the 131 proofs found by the two provers. Most proofs are checked completely in less than 30 seconds wall clock time, This shows that the proof checking task arising from a substantial prover application is practically tractable.

Requirements about the quality of medical guidelines can be represented using schemata borrowed from the theory of abductive diagnosis, using temporal logic to model the time-oriented aspects expressed in a guideline. Previously we have shown that these requirements can be verified using interactive theorem proving techniques. In this paper, we investigate how this approach can be mapped to the facilities of a resolution-based theorem prover, otter, and a complementary program that searches for finite models of first-order statements, MACE-2. It is shown that the reasoning that is required for checking the quality of a guideline can be mapped to such fully automated theorem-proving facilities. The medical quality of an actual guideline concerning diabetes mellitus 2 is investigated in this way.

Monodic temporal logic is the most general fragment of first-order logic for which sound and complete calculi have been developed so far, including resolution calculi. One such resolution calculus has been implemented in the system TeMP using the first-order theorem prover Vampire as a kernel for performing the main reasoning tasks of the system. In this paper, we describe the calculus underlying TeMP, its implementation, and our experiences with its application to the verification of parameterised cache coherence protocols. We present some preliminary observations about the use of a first-order theorem prover as kernel for a reasoning system for a more expressive logic.

This is a report about development and preliminary testing of the second version of the MPTP (Mizar Problems for Theorem Proving) system. The goal of this project is to make the large formal Mizar Mathematical Library (MML) available to current first-order automated theorem provers (ATPs).

This version of MPTP switches to a generic extended TPTP syntax that adds term-dependent sorts and abstract (Fraenkel) terms to TPTP. We describe these extensions and explain how they are transformed by MPTP to standard TPTP syntax using relativization of sorts and de-anonymization (``skolemization'') of abstract terms. Such transformations going into the structure of formulas are much easier to implement in Prolog than in Perl (used earlier), so the switch to the generic syntax also necessitated a switch in the problem-generating language. The system is now based solely on the newly available native XML format of Mizar articles, which is directly transformed using XSLT to the extended TPTP syntax, so no special-purpose MPTP exporter based on the Mizar implementation is necessary. This also makes export of full Mizar proofs much easier, and we discuss the TPTP syntax extensions for their encoding.

One of the goals of MPTP is ATP cross-verification of the Mizar library. The prerequisite for this is the cross-verification of the simplest Mizar inference steps. The feasibility of this has been tested on a small initial part ( ca. 40.000 problems) of the library. The methodology and results of this test are described here.

For some, the object of automated reasoning is the design and implementation of a program that offer sufficient power to enable one to contribute new and significant results to mathematics and to logic, as well as elsewhere. One measure of success rests with the number and quality of the results obtained with the assistance of the program in focus. A less obvious measure (heavily in focus here) rests with the ability of a novice, in the domain under investigation, to make significant contributions to one or more fields of science by relying heavily on a given reasoning program. For example, if one, who is totally unfamiliar with the area of study but skilled in automated reasoning, can discover with an automated reasoning program impressive proofs, previously unknown axiom dependencies, and far more, then the field of automated reasoning has indeed arrived. This article details such - how one novice, with much experience with W. McCune's program OTTER but no knowledge of the domains under investigation, obtained startling results in the study of areas of logic that include the BCSK logic and various extensions of that logic. Among those results was the discovery of a variety weaker than has been studied from what we know, a variety that appears to merit serious study, as, for example does the study of semigroups when compared with that of the study of groups. A quite different result concerns the discovery of a most unexpected dependency in two extensions of the BCSK logic.

SAT solving is important not only because of its theoretical interest, but also because it has gained success as a practical tool for problem solving. In this talk I will give an overview of my (and my colleagues') contributions to the research field.

The SAT solver MiniSAT was designed to be simple and extensible, yet similar in efficiency and architecture compared to other state-of-the-art SAT solvers. It supports incremental SAT solving, where a sequence of similar SAT problems can be expressed relative to their predecessors. The problems are solved more efficiently because the solver can reuse work from earlier runs. Another feature is the possibility to extend the solver programmatically with other types of boolean constraints, for instance 0-1 linear inequalities. In recent work, we improved MiniSAT by conflict clause simplification, preprocessing based on variable elimination and subsumption, and some changes to the variable order heuristics (i.e., black magic). At the time of writing, the current MiniSAT is doing very well in the 2005 edition of the SAT competition.

On the user end of SAT solving, there is often much to gain by choosing the right type of encoding for a particular problem. As a simple example, propositional formulas may be encoded to CNF by (or-)distribution, or by introducing new propositions for each formula, or a combination of the two. The resulting CNFs may give radically different runtime s when fed to a SAT solver. We have studied several applications of SAT, that can be seen as more involved examples of encoding techniques, including Model Checking, First Order Model Finding, and 0-1 Optimization.

with Optional Inductive Proof

We outline Tau, a practical and extensible hybrid theorem prover for first-order predicate calculus with identity. Tau is flexible and user-configurable, accepts the KIF Language, is implemented in Java, and has multiple user interfaces. Tau combines rule-based problem rewriting with Model Elimination, uses Brand's Modification Method to implement identity, and accepts user-configurable heuristic search to speed the search for proofs. Tau optionally implements mathematical induction. Formulas are input and output in KIF or infix FOPC, and other external forms can be added. Tau can be operated from a Web interface or from a command-line interface. Tau is implemented entirely in Java and can run on any system for which a current Java Virtual Machine is available.

This paper describes the implementation of the `Logiweb' system with emphasis on measures taken to support classical reasoning about programs. Logiweb is a system for authoring, storing, distributing, indexing, checking, and rendering of `Logiweb pages'. Logiweb pages may contain mathematical definitions, conjectures, lemmas, proofs, disproofs, theories, journal papers, computer programs, and proof checkers. Reading Logiweb pages merely requires access to the World Wide Web. Two example pages are available on http://yoa.dk/. Writing, checking, and publishing Logiweb pages requires Logiweb to be downloaded and installed. Logiweb comes with a hierarchy of features: Lemmas and proofs are stated in a theory named `Map Theory', Map Theory is implemented on top of a calculus named `Logiweb sequent calculus', and Logiweb sequent calculus is implemented on top of the `Logiweb reduction system' (a version of lambda calculus). The Logiweb reduction system is implemented in the Logiweb software which is currently implemented in Common Lisp. The levels above the Logiweb reduction system are defined on Logiweb pages, allowing users to use the features as they are or to define and publish new ones on new Logiweb pages. As an example, a user may want to use ZFC in place of Map Theory, in which case the easiest approach is to publish a Logiweb page that defines ZFC in Logiweb sequent calculus and proceed from there. The `base' page on http://yoa.dk/, which is 180 pages long when printed out, was checked in 40 seconds. This is non-trivial to achieve for a proof checker implemented in lamdba calculus. The Logiweb sequent calculus is defined on the base page mentioned above. A user who wants to define e.g. ZFC set theory on top of that may publish a new page, call it `zfc', and let the `zfc' page reference the `base' page. That makes all definitions in the `base' page available to the `zfc' page. After that, another user may state and prove lemmas about e.g. real numbers on a third page, call it `real', which references the `zfc' page. When the proofs on the `real' page are checked, Logiweb will arrange that the `zfc' and `base' pages are available in a predigested form suitable for proof checking. Seen from the point of view of proof checking and publication, the World Wide Web has the drawback that once submitted pages can be modified after submission. In the example above, modification of the `base' page could ruin the correctness of the `real' page. To avoid problems with pages being modified, Logiweb implements its own referencing system which forces immutability upon once submitted pages. Once a Logiweb page is submitted, it cannot be changed, just like papers cannot change after publication. When a Logiweb page is submitted, a unique Logiweb `reference' is computed from its contents. The Logiweb system allows to look up a Logiweb page given its reference. Once a Logiweb page is submitted, it may be moved and duplicated such that its http url may change and such that a page may be available many places in the world under different urls, but the Logiweb reference remains constant. One of the tasks of Logiweb is keep track of the relation between the fixed references and the associated, fluctuating set of http urls.

The Knuth-Bendix Ordering (KBO) is one of the term orderings in widespread use. We present a novel algorithm to compute KBO, which is (to our knowledge) the first asymptotically optimal one. Starting with an ``obviously correct'' version we use the method of program transformation to develop stepwise an efficient version, making clear the essential ideas, while retaining correctness. By theoretical analysis we show that the worst-case behavior is thereby changed from quadratic to linear. Measurements show the practical improvements of the different variants.