Entrants' System Descriptions

Ayane 2

Russell Wallace
Independent researcher, Ireland


Ayane is an automated theorem prover for first-order logic. It uses refutation by saturation, converting the input to clause normal form then generating additional clauses until a contradiction is found or until no further inference can be performed. Inference is based on the superposition calculus.


The current version uses a straightforward implementation of the given clause algorithm, with clauses selected for processing in order of size (smallest first) and basic filtering of generated clauses for duplication and tautologies.


Ayane is written in C#, and has been tested on Linux (Mono 2.4.4, should work with later or somewhat earlier versions) and Windows (.Net); it is expected to work on any platform that supports a reasonably up to date version of the CLR.

The system is available from


Expected Competition Performance

Ayane is not expected to win the competition yet, as it is still in early development and much work on efficient inference remains to be done.

Currahee(E,iProver) 0.1

Matthias Schmalz, Jann Röder
ETH Zurich, Switzerland


Currahee(E,iProver) 0.1 applies a number of axiom selection strategies to a problem in parallel, and passes the resulting smaller problems to E and iProver. Currahee(E,iProver) relies on the assumption that several axiom selection strategies and theorem provers are more successful than one axiom selection strategy with one theorem prover. The challenge was to find out the strengths of the individual selection strategies, i.e., when to apply which strategy.


Currahee(E,iProver) applies the following axiom selection strategies (with slight adjustments):
For a given problem the strategies are chosen depending on the problem's size.


Problems are parsed using Andrei Tchaltsev's TPTP parser (in Java). The axiom selection strategies have been reimplemented in Java. The underlying theorem provers are E 1.1 and iProver 0.8pre.

After the competition, Currahee(E,iProver) can be obtained from:


Expected Competition Performance

Based on preliminary measurements, we expect Currahee(E,iProver) to solve at least half of the problems from last year's LTB division (applying this year's resource restrictions).

Darwin 1.4.5

Peter Baumgartner
NICTA, Australia


Darwin [
BFT04,BFT06] is an automated theorem prover for first order clausal logic. It is an implementation of the Model Evolution calculus [BT03]. The Model Evolution calculus lifts the propositional DPLL procedure to first-order logic. One of the main motivations for this approach was the possibility of migrating to the first-order level some of those very effective search techniques developed by the SAT community for the DPLL procedure.

The current version of Darwin implements first-order versions of unit propagation inference rules, analogous to a restricted form of unit resolution and subsumption by unit clauses. To retain completeness, it includes a first-order version of the (binary) propositional splitting inference rule.


Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found. Darwin traverses the search space by iterative deepening over the term depth of candidate literals. Darwin employs a uniform search strategy for all problem classes.


The central data structure is the context. A context represents an interpretation as a set of first-order literals. The context is grown by using instances of literals from the input clauses. The implementation of Darwin is intended to support basic operations on contexts in an efficient way. This involves the handling of large sets of candidate literals for modifying the current context. The candidate literals are computed via simultaneous unification between given clauses and context literals. This process is sped up by storing partial unifiers for each given clause and merging them for different combinations of context literals, instead of redoing whole unifier computations. For efficient filtering of unneeded candidates against context literals, discrimination tree or substitution tree indexing is employed. The splitting rule generates choice points in the derivation which are backtracked using a form of backjumping, similar to the one used in DPLL-based SAT solvers.

Darwin is implemented in OCaml and has been tested under various Linux distributions. It is available from

Changes to the previous version consist of minor bug fixes.

Expected Competition Performance

We expect its performance to be very strong in the EPR division.

E/EP 1.2pre

Stephan Schulz
Technische Universität München, Germany


E 1.2pre [
Sch02,Sch04] is a purely equational theorem prover. The core proof procedure operates on formulae in clause normal form, using a calculus that combines superposition (with selection of negative literals) and rewriting. No special rules for non-equational literals have been implemented, i.e., resolution is simulated via paramodulation and equality resolution. The basic calculus is extended with rules for AC redundancy elimination, some contextual simplification, and pseudo-splitting with definition caching. The latest versions of E also supports simultaneous paramodulation, either for all inferences or for selected inferences.

E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of preprogrammed literal selection strategies. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

The prover uses a preprocessing step to convert formulae in full first order format to clause normal form. This step may introduce (first-order) definitions to avoid an exponential growth of formulae. Preprocessing also unfolds equational definitions and performs some simplifications on the clause level.

EP 1.2pre is just a combination of E 1.2pre in verbose mode and a proof analysis tool extracting the used inference steps.


The automatic mode determines literal selection strategy, term ordering, and search heuristic based on simple problem characteristics of the preprocessed clausal problem. E has been optimized for performance over the TPTP. The automatic mode of E 1.2pre is partially inherited from previous version and is based on about 60 test runs over TPTP 4.0.1. It consists of the selection of one of about 40 different strategies for each problem. All test runs have been performed on Linux/Intel machines with a time limit roughly equivalent to 300 seconds on 300MHz Sun SPARC machines, i.e., around 30 seconds on 2Ghz class machines. All individual strategies are refutationally complete.

E distinguishes problem classes based on a number of features, all of which have between 2 and 4 possible values. The most important ones are:

For classes above a threshold size, we assign the absolute best heuristic to the class. For smaller, non-empty classes, we assign the globally best heuristic that solves the same number of problems on this class as the best heuristic on this class does. Empty classes are assigned the globally best heuristic. Typically, most selected heuristics are assigned to more than one class.

For the LTB part of the competition, E will use a relevancy-based pruning approach and attempt to solve the problems with successively more complete specifications until it succeeds or runs out of time.


E is implemented in ANSI C, using the GNU C compiler. At the core is an implementation of aggressively shared first-order terms in a term bank data structure. Based on this, E supports the global sharing of rewrite steps. Rewriting is implemented in the form of rewrite links from rewritten to new terms. In effect, E is caching rewrite operations as long as sufficient memory is available. E uses perfect discrimination trees with age and size constraints for rewriting and unit-subsumption, feature vector indexing [Sch04] for forward- and backward subsumption and contextual literal cutting, and a new technique called fingerprint indexing for backward rewriting and (hopefully) paramodulation. Knuth-Bendix Ordering and Lexicographic Path Ordering are implemented using the linear and polynomial algorithms described by Bernd Löchner [Loe04,Loe06].

The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:

EP 1.2pre is a simple Bourne shell script calling E and the postprocessor in a pipeline.

Expected Competition Performance

In the last years, E performed well in most proof categories. We believe that E will again be among the stronger provers in the FOF and CNF categories. We hope that E will at least be a useful complement to dedicated systems in the other categories.

EP 1.2p will be hampered by the fact that it has to analyse the inference step listing, an operation that typically is about as expensive as the proof search itself. Nevertheless, it should be competitive among the FOF systems.

E-Darwin 1.3

Björn Pelzer
Universität Koblenz-Landau, Germany


E-Darwin is an automated theorem prover for first order clausal logic with equality. It is a modified version of the Darwin prover [
BFT04], intended as a testbed for variants of the Model Evolution calculus [BT03]. Among other things, it implements the Model Evolution with Equality calculus [BT05]. Also, since the last version, a new calculus has been implemented, using a different approach to incorporating equality reasoning into Model Evolution. This new calculus will be used for CASC this year. Three principal data structures are used: the context (a set of rewrite literals), the set of constrained clauses, and the set of derived candidates. The prover always selects one candidate, which may be a new clause or a new context literal, and exhaustively computes inferences with this candidate and the context and clause set, moving the results to the candidate set. Afterwards the candidate is inserted into one of the context or the clause set, respectively, and the next candidate is selected. The inferences are superposition-based. Demodulation and various means of redundancy detection are used as well.


The uniform search strategy is identical to the one employed in the original Darwin, slightly adapted to account for derived clauses.


E-Darwin is implemented in the functional/imperative language OCaml. The system has been tested on Unix and is available under the GNU Public License, from:
Darwin's method of storing partial unifiers has been adapted to equations and subterm positions for the superposition inferences in E-Darwin. A combination of perfect and non-perfect discrimination tree indexes is used to store the context and the clauses.

Expected Competition Performance

The new calculus used this year differs significantly from the previous versions , which still had a lot in common with the original Model Evolution. The implementation was completed very recently. As such we do not yet know what performance to expect.

E-KRHyper 1.1.4

Björn Pelzer
Universität Koblenz-Landau, Germany


E-KRHyper [
PW07] is a theorem proving and model generation system for first-order logic with equality. It is an implementation of the E-hyper tableau calculus [BFP07], which integrates a superposition-based handling of equality [BG98] into the hyper tableau calculus [BFN96]. The system is an extension of the KRHyper theorem prover [Wer03], which implements the original hyper tableau calculus.

An E-hyper tableau is a tree whose nodes are labeled with clauses, and which is built up by the application of the inference rules of the E-hyper tableau calculus. The calculus rules are designed such that most of the reasoning is performed using positive unit clauses. A branch can be extended with new clauses that have been derived from the clauses of that branch.

A positive disjunction can be used to split a branch, creating a new branch for each disjunct. No variables may be shared between branches, and if a case-split creates branches with shared variables, then these are immediately substituted by ground terms. The grounding substitution is arbitrary as long as the terms in its range are irreducible: the branch being split may not contain a positive equational unit which can simplify a substituting term, i.e., rewrite it with one that is smaller according to a reduction ordering. When multiple irreducible substitutions are possible, each of them must be applied in consecutive splittings in order to preserve completeness.

Redundancy rules allow the detection and removal of clauses that are redundant with respect to a branch.

The hyper extension inference from the original hyper tableau calculus is equivalent to a series of E-hyper tableau calculus inference applications. Therefore the implementation of the hyper extension in KRHyper by a variant of semi-naive evaluation [Ull89] is retained in E-KRHyper, where it serves as a shortcut inference for the resolution of non-equational literals.


E-KRHyper uses a uniform search strategy for all problems. The E-hyper tableau is generated depth-first, with E-KRHyper always working on a single branch. Refutational completeness and a fair search control are ensured by an iterative deepening strategy with a limit on the maximum term weight of generated clauses.


E-KRHyper is implemented in the functional/imperative language OCaml. The system runs on Unix and MS-Windows platforms and is available under the GNU Public License, from:
The system accepts input in the TPTP-format and in the TPTP-supported Protein-format. The calculus implemented by E-KRHyper works on clauses, so first order formula input is converted into CNF by an algorithm which follows the transformation steps as used in the clausification of Otter [McC03]. E-KRHyper operates on an E-hyper tableau which is represented by linked node records. Several layered discrimination-tree based indexes (both perfect and non-perfect) provide access to the clauses in the tableau and support backtracking.

Expected Competition Performance

Most of the work done on E-KRHyper since the last version is only related to its operation as a reasoning server within the question-answering system LogAnswer [FGP10], and these modifications have no effect on CASC-performance. Therefore we expect it to be comparable to Otter, like last year.

Geo 2010C

Hans de Nivelle
Uniwersytetu Wroclawskiego, Poland


Geo is based on geometric resolution [
dNM06,deN07,deN10]. A geometric formula is a formula without function symbols and constants, which has form:
   FORALL x1, ..., xn
   [ A1 /\ ... /\ Ap /\ v1 != w1 /\ ... /\ vq != wq -> Z(x1,...,xn) ].
The Ai are atoms, which are not equality atoms or inequality atoms. The arguments of the atoms Ai and the inequalities vj != wj must be among the variables x1, ..., xn. Z(x1,...,xn) must have one of the following three forms:
  1. The constant FALSE.
  2. A disjunction B1 \/ ... \/ Br, in which the atoms Bk are non-equality atoms, and the arguments of each atom Bk are among the variables x1,..., xn.
  3. An existential quantification EXISTS y B, in which y is a variable distinct from x1,...,xn, and B is a non-equality atom that has all its variables among x1,...,xn,y.
As input, Geo accepts geometric formulae and first-order formulae. First-order formulae are transformed into geometric formulae. During this translation, function symbols and constants are removed, and replaced by relations.

Geo accepts formulae either in its own syntax or in TPTP syntax. Because CNF has no special status for Geo, TPTP formulae in CNF form are treated as ordinary first-order formulae.

During search, Geo searches for a model of the geometric formulae by a combination of backtracking and learning. Whenever it cannot extend an interpretation into a model, Geo learns a new rule of Type 1, which ensures that Geo will not explore similar models in the future.

In case Geo finds a finite model, it simply prints the model. In case no model exists, it will eventually learn the rule -> FALSE, so that it is able to output a proof.

The final aim of geometric resolution, and of Geo, is to obtain a prover that is both good at finding proofs, and at finding finite models.


Currently, Geo has only one strategy: It searches for a model by depth-first search, and learns a new lemma at each backtracking point. Lemmas never expire, but they are sometimes simplified into smaller lemmas.


Geo is written in C++. We try to keep the code readable and reasonably efficient at the same time. Currently, Geo has no sophisticated data structures for efficiency. Our priority lies with the theoretical understanding of geometric resolution, adding a rich type system with partial functions to Geo, implementing full natural deduction proof output, and adding other features that might be useful in interactive verification.

Expected Competition Performance

There is no significant difference with earlier versions of Geo. We expect to end somewhere in the middle.

iProver 0.7

Konstantin Korovin
University of Manchester, United Kingdom


iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [
GK03,Kor09] which is complete for first-order logic. One of the distinctive features of iProver is a modular combination of first-order reasoning with ground reasoning. In particular, iProver currently integrates MiniSat [ES04] for reasoning with ground abstractions of first-order clauses. In addition to instantiation, iProver implements ordered resolution calculus and a combination of instantiation and ordered resolution; see [Kor08] for the implementation details. The saturation process is implemented as a modification of a given clause algorithm. We use non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04,Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. We implemented a compressed feature vector index for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality.


iProver has around 40 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth.


iProver is implemented in OCaml and for the ground reasoning uses MiniSat. iProver accepts FOF and CNF formats, in the case of FOF format, E prover is used for clausification. iProver is available from:

Expected Competition Performance

iProver 0.7 is the CASC-22 EPR division winner.

iProver(-SInE) 0.8

Konstantin Korovin
University of Manchester, United Kingdom


iProver is an automated instantiation-based theorem prover for first-order logic. We refer to description of
iProver 0.7 for general information. Major additions in the current version are: For the LTB division, axiom selection and clausification is done by Vampire 0.6. SInE scripts are used for problem scheduling and iProver for reasoning.


iProver accepts FOF and CNF formats, where either Vampire [RV01] or E prover [Sch04] is used for clausification of FOF problems. iProver is available from:

Expected Competition Performance

iProver 0.8 is expected to have strong performance in FOF, CNF, FNT, SAT and LTB divisions, and to considerably outperform iProver 0.7 in the EPR division.

iProver-Eq(-SInE) 0.6

Konstantin Korovin, Christoph Sticksel
University of Manchester, United Kingdom


iProver-Eq [
KS10] extends the iProver system [Kor08] with built-in equational reasoning, along the lines of [GK04]. As in the iProver system, first-order reasoning is combined with ground satisfiability checking, where the latter is delegated to an off-the-shelf ground solver.

iProver-Eq consists of three core components: i) ground reasoning by an SMT solver, ii) first-order equational reasoning on literals in a candidate model by a labelled unit superposition calculus [KS10] and iii) instantiation of clauses with substitutions obtained by ii).

Given a set of first-order clauses, iProver-Eq first abstracts it to a set of ground clauses which are passed to the ground solver. If the ground abstraction is unsatisfiable, then the set of first-order clauses is also unsatisfiable. Otherwise, literals are selected from the first-order clauses based on the model of the ground solver. The labelled unit superposition calculus checks whether the selected literals are conflicting. If they are conflicting, then clauses are instantiated such that the ground solver has to refine its model in order to resolve the conflict. Otherwise, satisfiability of the initial first-order clause set is shown.

Clause selection and literal selection in the unit superposition calculus are implemented in separate given clause algorithms. Relevant substitutions are accumulated in labels during unit superposition inferences and then used to instantiate clauses. For redundancy elimination iProver-Eq uses demodulation and dismatching constraints. In order to efficiently propagate redundancy elimination from instantiation into unit superposition, we implemented different representations of labels based on sets, AND/OR-trees and OBDDs. Non-equational resolution inferences provide further simplifications.

For the LTB division, axiom selection and clausification is done by Vampire 0.6. SInE scripts are used for problem scheduling and iProver-Eq for reasoning.


Proof search options in iProver-Eq control clause and literal selection in the respective given clause algorithms. Equally important is the global distribution of time between the inference engines and the ground solver. At CASC, iProver-Eq will execute a fixed schedule of selected options.

If no equational literals occur in the input, iProver-Eq falls back to the inference rules of iProver, otherwise the latter are disabled and only unit superposition is used. If all clauses are unit equations, no instantiations need to be generated and the calculus is run without the otherwise necessary bookkeeping.


iProver-Eq is implemented in OCaml and uses CVC3 [BT03] for the ground reasoning in the equational case and MiniSat [ES04] in the non-equational case. iProver-Eq accepts FOF and CNF formats, where either Vampire [RV01] or the E prover [Sch04] is used for clausification of FOF problems. iProver-Eq is available from

Expected Competition Performance

iProver-Eq has significantly improved over the version in the previous CASC. We expect reasonably good performance in all divisions, including the EPR divisions where instantiation-based methods are particularly strong.

Isabelle/HOL 2009-2

Jasmin C. Blanchette1, Lawrence C. Paulson2, Tobias Nipkow1, Makarius Wenzel1, Stefan Berghofer1
1Technische Universität München, Germany
2University of Cambridge, United Kingdom


Isabelle/HOL 2009-2 [
NPW02] is the higher-order logic incarnation of the generic proof assistant Isabelle2009-2. Isabelle/HOL provides several automatic proof tactics, notably an equational reasoner [Nip89], a classical reasoner [PN94], a tableau prover [Pau99], and a first-order resolution-based prover [Hur03].

Although Isabelle is designed for interactive proof development, it is a little known fact that it is possible to run Isabelle from the command line, passing in a theory file with a formula to solve. Isabelle theory files can include Standard ML code to be executed when the file is processed. The TPTP2X Isabelle format module outputs a THF problem in Isabelle/HOL syntax, augmented with ML code that (1) runs the ten tactics in sequence, each with a CPU time limit, until one succeeds or all fail, and (2) reports the result and proof (if found) using the SZS standards. A Perl script is used to insert the CPU time limit (equally divided over the ten tactics) into TPTP2X's Isabelle format output, and then run the command line isabelle-process on the resulting theory file.


The Isabelle/HOL wrapper submitted to the competition tries the following tactics sequentially:
Performs equational reasoning using rewrite rules.
Searches for a proof using a fast untyped tableau prover and then attempts to reconstruct the proof using Isabelle tactics.
Combines simplification and classical reasoning under one roof.
Combines ordered resolution and ordered paramodulation. The proof is then reconstructed using Isabelle tactics.
Searches for a proof using sequent-style reasoning, performing a depth-first search. Unlike blast and metis, they construct proofs directly in Isabelle. That makes them slower, but enables them to work in the presence of the more unusual features of HOL, such as type classes and function unknowns.
Combines fast and simp.
Similar to fast, except that it performs a best-first search.
Similar to auto, but more exhaustive.
Implements Loveland's MESON procedure [Lov78]. Constructs proofs directly in Isabelle.
Invokes the Z3 SMT solver [dMB08] developed at Microsoft Research, and reconstructs the proofs in Isabelle [Boh09].


Isabelle is a generic theorem prover written in Standard ML. Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The HOL object logic extends pure with a more elaborate version of higher-order logic, complete with the familiar connectives and quantifiers. Other object logics are available, notably FOL (first-order logic) and ZF (Zermelo–Fraenkel set theory).

The implementation of Isabelle relies on a small LCF-style kernel, meaning that inferences are implemented as operations on an abstract theorem data type. Assuming the kernel is correct, all values of type theorem are correct by construction.

Most of the code for Isabelle was written by the Isabelle teams at the University of Cambridge and the Technische Universität München. A notable exception is the metis proof method, which was taken from the HOL4 theorem prover (also implemented in ML).

Isabelle/HOL is available for all major platforms under a BSD-style license, from


Expected Competition Performance

Results from last year would suggest that Isabelle will finish third in the THF category, after TPS and LEO-II. However, since last year, we have added the smt proof method, which we expect will increase our theorem count by about 50.

leanCoP(-SInE) 2.2

Jens Otten
University of Potsdam, Germany


leanCoP [
OB03,Ott08] is an automated theorem prover for classical first-order logic with equality. It is a very compact implementation of the connection (tableau) calculus [Bib87,LS01].

For the LTB division, leanCoP-SInE runs leanCoP as the underlying inference engine of SInE 0.4, i.e., axiom selection is done by SInE, and proof attempts are done by leanCoP. SInE is developed by Kryštof Hoder; leanCoP-SInE is co-developed by Thomas Raths.


The reduction rule of the connection calculus is applied before the extension rule. Open branches are selected in a depth-first way. Iterative deepening on the proof depth is used to achieve completeness. Additional inference rules and strategies are regularity, lemmata, and restricted backtracking [Ott10]. leanCoP uses an optimized structure-preserving transformation into clausal form [Ott10] and a fixed strategy scheduling.


leanCoP is implemented in Prolog (ECLiPSe, SICStus, and SWI Prolog are currently supported). The source code of the core prover is only a few lines long and fits on half a page. Prolog's built-in indexing mechanism is used to quickly find connections.

leanCoP can read formulae using the leanCoP syntax as well as the (raw) TPTP syntax format. Equality axioms are automatically added if required. The core leanCoP prover returns a very compact connection proof, which is translated into a readable proof. Several output formats are available. As the main enhancement, leanCoP 2.2 now supports the output of proofs in the (proposed) TPTP syntax for representing derivations in connection (tableau) calculi [OS10].

The source code of leanCoP 2.2, which is available under the GNU general public license, together with more information can be found on the leanCoP web site:


Expected Competition Performance

As the core prover has not changed, we expect the performance of leanCoP 2.2 and leanCoP-SInE 2.2 to be similar to the performance of leanCoP 2.1 and leanCoP-SInE 2.1, respectively.

leanCoP-Omega 0.1

Jens Otten, Holger Trölenberg, Thomas Raths
University of Potsdam, Germany


leanCoP-Ω is an automated theorem prover for classical first-order logic with equality and interpreted functions and predicates for integer arithmetic. It combines version 2.1 of the compact connection prover leanCoP [
Ott08,Ott10] with version 1.2 of the Omega test system [Pug92].


The Omega test determines whether there is an integer solution to an arbitrary set of linear equalities and inequalities. It is based on an extension of Fourier-Motzkin variable elimination (a linear programming method) to integer programming, and has worst-case exponential time complexity [Pug92]. Omega test is invoked from the leanCoP prover if equalities or inequalities are contained in the active path of a connection derivation.


The leanCoP prover is implemented in Prolog; see the leanCoP 2.2 description for more details. The Omega test system is implemented in C++. If leanCoP-Ω finds a proof it returns a very compact connection proof.

LEO-II 1.2

Christoph Benzmüller1, Frank Theiss2
1Articulate Software, USA, 2Saarland University, Germany


BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP systems E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. The default transformation is Hurd's fully typed translation [Hur03]. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., n = 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.


LEO-II employs an adapted "Otter loop". In contrast to its competitor systems (such as Satallax, TPS, and IsabelleP) LEO-II so far only employs a monolithic search strategy, that is, it does not use strategy scheduling to try different search strategies or flag settings. However, LEO-II version 1.2 for the first time includes some very naive relevance filtering and selectively applies some simple scheduling for different relevance filters.


LEO-II is implemented in Objective Caml version 3.10, and its problem representation language is the new TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10].

The improved performance of LEO-II in comparison to its predecessor LEO (implemented in LISP) is due to several novel features including the exploitation of term sharing and term indexing techniques [TB06], support for primitive equality reasoning (extensional higher-order RUE resolution), and improved heuristics at the calculus level. One recent development is LEO-II's new parser: in addition to the TPTP THF language, this parser now also supports the TPTP FOF and CNF languages. Hence, LEO-II can now also be used for FOF and CNF problems. Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from:


Expected Competition Performance

LEO-II (resp. its main developer C. Benzmüller) is currently on parental leave. As a result LEO-II has not much improved over the last year (except for its parser), while competitors such as IsabelleP and the new Satallax prover seem to have significantly gained on strength.

This year LEO-II will also participate in the FOF and CNF categories in order to evaluate its performance for these fragments. For this, note that LEO-II still employs its own input processing and normalization techniques, and that calls to prover E are applied only modulo Hurd's fully typed translation. The evaluation will thus reveal the performance loss in comparison to E and it will likely point to relevant future work. Another reason for entering THF, FOF, and CNF is to demonstrate that integrated higher-order-first-order theorem provers are generally feasible.

Metis 2.2

Joe Hurd
Galois, Inc., USA


Metis 2.2 [
Hur03] is a proof tactic used in the HOL4 interactive theorem prover. It works by converting a higher order logic goal to a set of clauses in first order logic, with the property that a refutation of the clause set can be translated to a higher order logic proof of the original goal.

Experiments with various first order calculi [Hur03] have shown a given clause algorithm and ordered resolution best suit this application, and that is what Metis 2.2 implements. Since equality often appears in interactive theorem prover goals, Metis 2.2 also implements the ordered paramodulation calculus.


Metis 2.2 uses a fixed strategy for every input problem. Negative literals are always chosen over positive literals, and terms are ordered using the Knuth-Bendix ordering with uniform symbol weight and precedence favouring reduced arity.


Metis 2.2 is written in Standard ML, for ease of integration with HOL4. It uses indexes for resolution, paramodulation, (forward) subsumption, and demodulation. It keeps the Active clause set reduced with respect to all the unit equalities so far derived.

In addition to standard age and size measures, Metis 2.2 uses finite models to weight clauses in the Passive set. When integrated with higher order logic, an interpretation of known functions and relations is manually constructed to make many of their standard properties valid in the finite model. For example, if the domain of the model is the set {0,...,7}, and the higher order logic arithmetic functions are interpreted in the model modulo 8. Unknown functions and relations are interpreted randomly, but with a bias towards making supporting theorems valid in the model. The finite model strategy carries over to TPTP problems, by manually interpreting a collection of functions and relations that appear in TPTP axiom files in such a way as to make the axioms valid in the model.

Metis 2.2 reads problems in TPTP format and outputs detailed proofs in TSTP format, where each refutation step is one of 6 simple inference rules. Metis 2.2 implements a complete calculus, so when the set of clauses is saturated it can soundly declare the input problem to be unprovable (and outputs the saturation set).

Metis 2.2 is free software, released under the GPL. It is available from


Expected Competition Performance

There have been only minor changes to Metis 2.2 since CASC-22, so it is expected to perform at approximately the same level in CASC-J5 and end up in the lower half of the table. However, it is possible that Metis 2.2 might get a boost from its finite model clause weighting, since this year the CASC competition rules have changed so that function and relation names are no longer obfuscated.

MetiTarski 1.3

Larry Paulson
University of Cambridge, United Kingdom


MetiTarski [
AP10,Bro03] is an automatic theorem prover based on a combination of resolution and QEPCAD-B [Bro03], a decision procedure for the theory of real closed fields. It is designed to prove theorems involving real-valued special functions such as log, exp, sin, cos and sqrt. In particular, it is designed to prove universally quantified inequalities involving such functions. This problem is undecidable: MetiTarski is incomplete. MetiTarski is a modified version of Joe Hurd's theorem prover, Metis [Hur03].


MetiTarski employs resolution, augmented with axiom files that specify upper and lower bounds of the special functions mentioned in the problem. MetiTarski also has code to simplify polynomials and put them into canonical form. The resolution calculus is extended with a literal deletion rule: if the decision procedure finds a literal to be inconsistent with its context (which consists of known facts and the negation of the other literals in the clause), then it is deleted.


MetiTarski, like Metis, is implemented in Standard ML. QEPCAD is implemented in C and C++ (and unfortunately is very difficult to build on 64 bit machines). The latest version of MetiTarski can be downloaded from:

Muscadet 4.0

Dominique Pastre
University Paris Descartes, France


The Muscadet theorem prover is a knowledge-based system. It is based on Natural Deduction, following the terminology of [
Ble71] and [Pas78], and uses methods that resemble those used by humans. It is composed of an inference engine, which interprets and executes rules, and of one or several bases of facts, which are the internal representation of "theorems to be proved". Rules are either universal and put into the system, or built by the system itself by metarules derived from data (definitions and lemmas). Rules may add new hypotheses, modify the conclusion, create objects, split theorems into two or more subtheorems, or build new rules that are local for a (sub-)theorem.


There are specific strategies for existential, universal, conjunctive and disjunctive hypotheses and conclusions, and equalities. Functional symbols may be used, but an automatic creation of intermediate objects allows deep subformulae to be flattened and treated as if the concepts were defined by predicate symbols. The successive steps of a proof may be forward deduction (deduce new hypotheses from old ones), backward deduction (replace the conclusion by a new one), refutation (only if the conclusion is a negation), search for objects satisfying the conclusion, or dynamic building of new rules.

The system is also able to work with second order statements. It may also receive knowledge and know-how for a specific domain from a human user; see [Pas89] and [Pas93]. These two possibilities are not used while working with the TPTP Library.


Muscadet [Pas01] is implemented in SWI-Prolog. Rules are written as more or less declarative Prolog clauses. Metarules are written as sets of Prolog clauses. The inference engine includes the Prolog interpreter and some procedural Prolog clauses. A theorem may be split into several subtheorems, structured as a tree with "and" and "or" nodes. All the proof search steps are memorized as facts including all the elements which will be necessary to later extract the useful steps (the name of the executed action or applied rule, the new facts added or rule dynamically built, the antecedents, and a brief explanation).

Muscadet is available from:


Expected Competition Performance

The best performances of Muscadet will be for problems manipulating many concepts in which all statements (conjectures, definitions, axioms) are expressed in a manner similar to the practice of humans, especially of mathematicians [Pas02, Pas07]. It will have poor performances for problems using few concepts but large and deep formulae leading to many splittings. Its best results will be in set theory, especially for functions and relations. It's originality is that proofs are given in natural style.

omkbTT 1.0

Sarah Winkler
University of Innsbruck, Austria


Based on ordered completion [
BDP89], omkbTT 1.0 constitutes a theorem prover for unit equality problems. It stands out from classical completion tools in two respects: (1) automatic termination tools replace a fixed reduction order, and (2) omkbTT employs a multi-completion approach [KK99] to explore multiple branches of the search tree at once. A detailed description of the underlying inference system can be found in [WM10].


omkbTT keeps track of multiple ordered completion processes using different reduction orders developed during the deduction. The approach of multi-completion allows sharing of many inference steps among these processes. A deduction succeeds as soon as one of the processes does.

omkbTT is parameterized by a termination strategy for the termination checks performed internally, and a selection strategy to choose the equation to be processed next. To ensure that a saturated set of equations and rules is ground-complete, the default termination strategy comprises a number of termination techniques inducing total termination such as Knuth-Bendix orders, lexicographic path orders, polynomial interpretations, and multiset path orders. The default selection strategy first chooses a process for which the size of its current equation and constraint set is minimal. Then a node for this process is selected by considering the term size and timestamp, the latter to ensure fairness of the deduction.

Since in the setting with termination tools the reduction order developed along the way is not known in advance, the set of ordered critical pairs is not computable during the deduction. Instead, critical consequences are over-approximated using the embedding relation [WM10]. Furthermore, omkbTT uses multi-completion variants of critical pair criteria [BD88] to restrict the number of equational consequences, and isomorphic processes are filtered out to restrict the search space. Details about these optimizations can be found in [WS+10].


Due to the multi-completion approach, equations and rewrite rules as well as goals are represented using nodes. This data structure connects a pair of terms with the sets of processes which consider it as an equation or rule, respectively. The inference rules of omkbTT working on sets on nodes are combined in a DISCOUNT-like control loop. In order to accelerate the retrieval of adequate terms for rewriting and the computation of overlaps, omkbTT relies on code trees [Vor01] as its default term indexing technique.

omkbTT is written in OCaml. It interfaces the termination tool TTT2 [KS+09] internally to perform termination checks. Version 1.0 is available from:


Expected Competition Performance

As a prototype of the described approach, the system is to demonstrate the use of termination tools in a multi-completion setting. omkbTT is expected to solve most easy problems and a moderate number of the more difficult instances (but is not assumed to win its division).

Otter 3.3

William McCune
Argonne National Laboratory, USA


Otter 3.3 [
McC03] is an ATP system for statements in first-order (unsorted) logic with equality. Otter is based on resolution and paramodulation applied to clauses. An Otter search uses the "given clause algorithm", and typically involves a large database of clauses; subsumption and demodulation play an important role.


Otter's original automatic mode, which reflects no tuning to the TPTP problems, will be used.


Otter is written in C. Otter uses shared data structures for clauses and terms, and it uses indexing for resolution, paramodulation, forward and backward subsumption, forward and backward demodulation, and unit conflict. Otter is available from:

Expected Competition Performance

Otter has been entered into CASC as a stable benchmark against which progress can be judged (there have been only minor changes to Otter since 1996 [MW97], nothing that really affects its performance in CASC). This is not an ordinary entry, and we do not hope for Otter to do well in the competition.

Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and Rusty Lusk contributed to the development of Otter.

Paradox 3.0

Koen Claessen, Niklas Sörensson
Chalmers University of Technology, Sweden


Paradox [
CS03] is a finite-domain model generator. It is based on a MACE-style [McC03] flattening and instantiating of the first-order clauses into propositional clauses, and then the use of a SAT solver to solve the resulting problem.

Paradox incorporates the following features: Polynomial-time clause splitting heuristics, the use of incremental SAT, static symmetry reduction techniques, and the use of sort inference.


There is only one strategy in Paradox:
  1. Analyze the problem, finding an upper bound N on the domain size of models, where N is possibly infinite. A finite such upper bound can be found, for example, for EPR problems.
  2. Flatten the problem, and split clauses and simplify as much as possible.
  3. Instantiate the problem for domain sizes 1 up to N, applying the SAT solver incrementally for each size. Report "SATISFIABLE" when a model is found.
  4. When no model of sizes smaller or equal to N is found, report "CONTRADICTION".
In this way, Paradox can be used both as a model finder and as an EPR solver.


The main part of Paradox is implemented in Haskell using the GHC compiler. Paradox also has a built-in incremental SAT solver which is written in C++. The two parts are linked together on the object level using Haskell's Foreign Function Interface.

Expected Competition Performance

Paradox 3.0 is the CASC-22 FNT division winner.

Satallax 1.4

Chad E. Brown
Saarland University, Germany


Satallax is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [
ES04] is responsible for much of the search for a proof.

The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB10]. A problem is given using an S-expression variant of the THF format. A branch is formed from the axioms of the problem and negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch.

Satallax progressively generates higher-order formulae and corresponding propositional clauses. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat as an engine to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. If there are no quantifiers at function types, the generation of higher-order formulae and corresponding clauses may terminate [BS09,BS09]. In such a case, if MiniSat reports the final set of clauses as satisfiable, then the original set of higher-order formulae is satisfiable (by a standard model in which all types are interpreted as finite sets).


There are a number of flags that control the order in which instances of tableau rules are considered. Other flags activate some optional extensions to the basic proof procedure. Three such extensions have proven particularly successful.
  1. A preprocessing stage may split the original problem into several problems to be solved independently. This is helpful, for example, when the claim to be proven is an equivalence.
  2. The set of all (ground) subterms of the problem can be preemptively included in the set of all allowed instantiations for quantifiers. This is helpful when strictly following the tableau calculus rules would require taking a roundabout path to licensing a subterm as a legal instantiation.
  3. Some universally quantified formulae can be treated as higher-order clauses with strict occurrences of higher-order variables. Whenever new (ground) formulae are considered, pattern matching can be used to create ground instances of higher-order clauses. These ground clauses are included in the clause set sent to MiniSat.
A collection of flag settings is called a mode. Approximately 250 modes have been tried so far. Regardless of the mode, the search procedure is sound and complete for higher-order logic with choice. This implies that if search terminates with a particular mode, then we can conclude that the original set of formulae is unsatisfiable or satisfiable.

A strategy is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Satallax 1.4 uses a default strategy consisting of 22 modes determined through experimentation using the THF problems in the TPTP library.


Satallax is implemented in Steel Bank Common Lisp. Satallax calls MiniSat which is implemented in C++. Satallax is available from:

Expected Competition Performance

Since Satallax supports theorem proving with a choice operator, there should be a few problems only Satallax can solve. In addition, the use of MiniSat to do the propositional part of the search seems to be very effective. Preliminary tests show Satallax to be competitive. On the other hand, the primary instantiation technique for higher-order quantifiers is essentially a generate-and-test method. This means most problems that require nontrivial higher-order instantiations will be out of reach for the time being.


Mark Stickel
SRI International, USA


SNARK 20080805r027 is a resolution and paramodulation theorem prover for first-order predicate calculus. Procedural attachment is used to evaluate arithmetic expressions. The objective to date is to add some arithmetic calculation in support of SNARK's applications, not to duplicate or incorporate symbolic algebra systems to extend SNARK's mathematical range. SNARK has a rudimentary type system that includes subtypes but not polymorphism. Thus, the mapping onto TPTP arithmetic types, which lack subtypes, and TPTP arithmetic operations, which are polymorphic, is approximate. It uses Common Lisp's unlimited precision rational arithmetic. Integers are a subtype. Inexact floating-point arithmetic is not used; real number inputs are converted to rationals. Real numbers are a supertype so that irrational numbers can be represented symbolically.

Procedural attachment is used to evaluate arithmetic expressions that are fully instantiated with numeric arguments. For example, ($$SUM 2 2) and ($$LESS 2 3) can be rewritten to 4 and TRUE. The equality relation is also procedurally attached when an argument is numeric so that, for example, (= ($$SUM ?X 2) 4) can be solved.


SNARK is written in Common Lisp that is also used as its extension and scripting language. SNARK is available from

SPASS+T 2.2.12

Uwe Waldmann1, Stephan Zimmer2
1Max-Planck-Institut für Informatik, Germany, 2AbsInt GmbH, Germany


SPASS+T is an extension of the superposition-based theorem prover SPASS that integrates algebraic knowledge into SPASS in three complementary ways: by passing derived formulae to an external SMT procedure (currently Yices or CVC3), by adding standard axioms, and by built-in arithmetic simplification and inference rules. A first version of the system has been described in [
PW06]. In the current version, a much more sophisticated coupling of the SMT procedure has been added [Zim07].


Standard axioms and built-in arithmetic simplification and inference rules are integrated into the standard main loop of SPASS. The external SMT procedure runs in parallel in a separate process, leading occasionally to non-deterministic behaviour.


SPASS+T is implemented in C. The system is available from

SPASS-XDB 3.01X0.6

Geoff Sutcliffe1, Martin Suda2
1University of Miami, USA, 2Charles University in Prague, Czech Republic


SS+09,SS+10] is an extended version of the well-known, state-of-the-art, SPASS automated theorem proving system [WF+09]. The original SPASS reads a problem, consisting of axioms and a conjecture, in TPTP format from a file, and searches for a proof by refutation for the conjecture. SPASS-XDB adds the capability of retrieving extra positive unit axioms (facts) from external sources during the proof search (hence the "XDB", standing for eXternal DataBases). The axioms are retrieved asynchronously, on-demand, based on an expectation that they will contribute to completing the proof. The axioms are retrieved from a range of external sources, including SQL databases, SPARQL endpoints, WWW services, computation sources (e.g., computer algebra systems), etc., using a TPTP standard protocol.

For the TFA division, the TFF formulae are converted to FOF using the standard approach [Wal83], but without predicates to check the types of numeric variables. This means SPASS-XDB is unsound for certain types of conjectures that mix different types of numbers (e.g., integers with reals), but those problems will not arise in CASC. Until recently a simple computation system implemented in Prolog was used (and it's still available) to obtain ground arithmetic facts, as required to resolve against arithmetic relations (i.e., all arithmetic function evaluation had to be done inside an $evaleq/2 relation). More recently this has been replaced by internal evaluation of ground arithmetic relations and functions. The external mechanism is now being used to develop more sophisticated arithmetic capabilities, e.g., solving for a variable in an equation involving arithmetic functions, e.g., solving for X in $sum(2,X) = 5. It is hoped this will be ready in time for CASC. SPASS-XDB supports integer, rational, and finite precision real arithmetic.


Generally, SPASS-XDB follows SPASS' strategies. However, SPASS, like most (all?) ATP systems, was designed under the assumption that all formulae are in the problem file, i.e., it is ignorant that external axioms might be delivered. To regain completeness, constraints on SPASS’ search had to be relaxed in SPASS-XDB. This increases the search space, i.e., so the constraints were relaxed in a controlled, incremental fashion. [SS+10] The search space is also affected by the number of external axioms that can be delivered, and mechanisms to control the delivery and focus the consequent search have been implemented. [SS+10]


SPASS-XDB, as an extension of SPASS, is written in C. The internal arithmetic is done using the GMP multiple precision arithmetic library. Reals are converted to rationals for computation, but results are presented in real format. The intenal arithmetic computation system used for the TFA division is written in SWI Prolog, and compiled to an executable. SPASS_XDB is available for use online in the SystemOnTPTP interface:

TPS 3.080227G1d

Peter B. Andrews1, Chad E. Brown2
1Carnegie Mellon University, USA, 2Saarland University, Germany


TPS is a higher-order theorem proving system that has been developed over several decades under the supervision of Peter B. Andrews with substantial work by Eve Longini Cohen, Dale A. Miller, Frank Pfenning, Sunil Issar, Carl Klapper, Dan Nesmith, Hongwei Xi, Matthew Bishop, Chad E. Brown, and Mark Kaminski. TPS can be used to prove theorems of Church's type theory automatically, interactively, or semi-automatically [

When searching for a proof, TPS first searches for an expansion proof [Mil87] or an extensional expansion proof [Bro07] of the theorem. Part of this process involves searching for acceptable matings [And81]. Using higher-order unification, a pair of occurrences of subformulae (which are usually literals) is mated appropriately on each vertical path through an expanded form of the theorem to be proved. The expansion proof thus obtained is then translated [Pfe87] without further search into a proof of the theorem in natural deduction style. The translation process provides an effective soundness check, but it is not actually used during the competition.


Strategies used by TPS in the search process include:


TPS has been developed as a research tool for developing, investigating, and refining a variety of methods of searching for expansion proofs, and variations of these methods. Its behavior is controlled by hundreds of flags. A set of flags, with values for them, is called a mode. 52 modes have been found which collectively suffice for automatically proving virtually all the theorems which TPS has proved automatically thus far. When searching for a proof in automatic mode, TPS tries each of these modes in turn for a specified amount of time.

TPS is implemented in Common Lisp, and is available from:


Expected Competition Performance

TPS 3.080227G1d is the CASC-22 THF division winner.

Vampire 10.0

Andrei Voronkov
University of Manchester, United Kingdom

No system description supplied.

Expected Competition Performance

Vampire 10.0 is the CASC-22 CNF division winner.

Vampire 11.0 and Vampire-LTB 11.0

Andrei Voronkov, Kryštof Hoder
University of Manchester, United Kingdom


Vampire 11.0 is an automatic theorem prover for first-order classical logic. It consists of a shell and a kernel. The kernel implements the calculi of ordered binary resolution and superposition for handling equality. The splitting rule in kernel adds propositional parts to clauses, which are manipulated using binary decision diagrams (BDDs). A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering.

Substitution tree indexing is used to implement all major operations on sets of terms and literals. Although the kernel of the system works only with clausal normal forms, the shell accepts a problem in the full first-order logic syntax, clausifies it and performs a number of useful transformations before passing the result to the kernel. Also the axiom selection algorithm of SInE can be enabled as part of the preprocessing.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.


The Vampire 11.0 kernel provides a fairly large number of features for strategy selection. The most important ones are: The Vampire 11.0 core is a completely new Vampire, and it shares virtually no code with the previous versions. The automatic mode of Vampire 11.0, however, uses both the Vampire 11.0 core and Vampire 10.0 to solve problems, based on input problem classification that takes into account simple syntactic properties, such as being Horn or non-Horn, presence of equality, etc.


Vampire 11.0 is implemented in C++. The supported compilers are gcc 4.1.x, gcc 4.3.x.

Expected Competition Performance

Vampire 11.0 is the CASC-22 FOF division winner. Vampire-LTB 11.0 is the CASC-22 LTB division winner.

Waldmeister C09a

Thomas Hillenbrand
Max-Planck-Institut für Informatik, Germany


Waldmeister C09a [
Hil03] is a system for unit equational deduction. Its theoretical basis is unfailing completion in the sense of [BDP89] with refinements towards ordered completion (cf. [AHL03]). The system saturates the input axiomatization, distinguishing active facts, which induce a rewrite relation, and passive facts, which are the one-step conclusions of the active ones up to redundancy. The saturation process is parameterized by a reduction ordering and a heuristic assessment of passive facts [HJL99].

Waldmeister C09a is a minor upgrade of Waldmeister 806, the best feature of which is TPTP format output.


The approach taken to control the proof search is to choose the search parameters according to the algebraic structure given in the problem specification [HJL99]. This is based on the observation that proof tasks sharing major parts of their axiomatization often behave similar. Hence, for a number of domains, the influence of different reduction orderings and heuristic assessments has been analyzed experimentally; and in most cases it has been possible to distinguish a strategy uniformly superior on the whole domain. In essence, every such strategy consists of an instantiation of the first parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and an instantiation of the second parameter to one of the weighting functions addweight, gtweight, or mixweight, which, if called on an equation s = t, return |s| + |t|, |max>(s,t)|, or |max>(s,t)| · (|s| + |t| + 1) + |s| + |t|, respectively, where |s| denotes the number of symbols in s.


The prover is coded in ANSI-C. It runs on Solaris, Linux, and newly also on MacOS X. In addition, it is now available for Windows users via the Cygwin platform. The central data structures are: perfect discrimination trees for the active facts; group-wise compressions for the passive ones; and sets of rewrite successors for the conjectures. Visit the Waldmeister web pages at:

Expected Competition Performance

Waldmeister C09a is the CASC-22 UEQ division winner.

Waldmeister 710

Thomas Hillenbrand
Max-Planck-Institut für Informatik, Germany


Waldmeister 710 [
Hil03] is a system for unit equational deduction. Its theoretical basis is unfailing completion in the sense of [BDP89] with refinements towards ordered completion (cf. [AHL03]). The system saturates the input axiomatization, distinguishing active facts, which induce a rewrite relation, and passive facts, which are the one-step conclusions of the active ones up to redundancy. The saturation process is parameterized by a reduction ordering and a heuristic assessment of passive facts [HJL99]. This year's version is the result of polishing and fixing a few things in last year's.


The approach taken to control the proof search is to choose the search parameters – reduction ordering and heuristic assessment – according to the algebraic structure given in the problem specification [HJL99]. This is based on the observation that proof tasks sharing major parts of their axiomatization often behave similarly.


The prover is coded in ANSI-C. It runs on Solaris, Linux, MacOS X, and Windows/Cygwin. The central data strucures are: perfect discrimination trees for the active facts; group-wise compressions for the passive ones; and sets of rewrite successors for the conjectures. Visit the Waldmeister web pages at:

Expected Competition Performance

The system should again be rather strong, and close to the previous version.

Zenon 0.6.3

Damien Doligez
INRIA, France


Zenon 0.6.3 [
BDD07] is a theorem prover based on a proof-confluent version of analytic tableaux. It uses all the usual tableau rules for first-order logic with a rule-based handling of equality.

Zenon outputs formal proofs that can be checked by Coq or Isabelle.


Zenon is a fully automatic black-box design with no user-selectable strategies.


Zenon is implemented in OCaml. Its most interesting data structure is the representation of first-order formulae and terms: they are hash-consed modulo alpha conversion. Zenon is available from:

Expected Competition Performance

This version is only slightly better than the version of 2008, and the handling of equality is still really bad, so Zenon is again expected to rank in the lower part of the field.


Akbarpour B., Paulson L. (2010), MetiTarski: An Automatic Theorem Prover for Real-Valued Special Functions, Journal of Automated Reasoning 44(3), 175-205.

Andrews P. B., Bishop M., Issar S., Nesmith. D., Pfenning F., Xi H. (1996), TPS: A Theorem-Proving System for Classical Type Theory, Journal of Automated Reasoning 16(3), 321-353.

Andrews P. B., Brown C.E. (2006), TPS: A Hybrid Automatic-Interactive System for Developing Proofs, Journal of Applied Logic 4(4), 367-395.

Andrews P. B. (1981), Theorem Proving via General Matings, Journal of the ACM 28(2), 193-214.

Andrews P. B. (1989), On Connections and Higher-Order Logic, Journal of Automated Reasoning 5(3), 257-291.

Avenhaus J., Hillenbrand T., Löchner B. (2003), On Using Ground Joinable Equations in Equational Theorem Proving, Journal of Symbolic Computation 36(1-2), 217-233, Elsevier Science.

Bachmair L., Dershowitz N., Plaisted D.A. (1989), Completion Without Failure, Ait-Kaci H., Nivat M., Resolution of Equations in Algebraic Structures, 1-30, Academic Press.

Bachmair L., Dershowitz N. (1988), Critical Pair Criteria for Completion, Journal of Symbolic Computation 6(1), 1-18.

Bachmair L., Ganzinger H. (1998), Equational Reasoning in Saturation-Based Theorem Proving, Bibel W., Schmitt P.H., Automated Deduction, A Basis for Applications, 352-397, Applied Logic Series I Foundations - Calculi and Methods(10), Kluwer Academic Publishers.

Backes J., Brown C. (2010), Analytic Tableaux for Higher-Order Logic with Choice, Giesl J., Haehnle R., Proceedings of the 5th International Joint Conference on Automated Reasoning (Edinburgh, United Kingdom), To appear, Lecture Notes in Artificial Intelligence.

Barrett C., Tinelli C. (2007), CVC3, Damm W., Hermanns H., Proceedings of the 19th International Conference on Computer Aided Verification (Berlin, Germany), 298-302, Lecture Notes in Computer Science 4590, Springer-Verlag.

Baumgartner P., Fuchs A., Tinelli C. (2004), Darwin - A Theorem Prover for the Model Evolution Calculus, Sutcliffe G., Schulz S., Tammet T., Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning (Cork, Ireland).

Baumgartner P., Fuchs A., Tinelli C. (2006), Lemma Learning in the Model Evolution Calculus, Hermann M., Voronkov A., Proceedings of the 13th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (Phnom Penh, Cambodia), 572-585, Lecture Notes in Artificial Intelligence 4246.

Baumgartner P., Furbach U., Niemelä I. (1996), Hyper Tableaux, Alferes J., Pereira L., Orlowska E., Proceedings of JELIA'96: European Workshop on Logic in Artificial Intelligence (Evora, Portugal), 1-17, Lecture Notes in Artificial Intelligence 1126, Springer-Verlag.

Baumgartner P., Furbach U., Pelzer B. (2007), Hyper Tableaux with Equality, Pfenning F., Proceedings of the 21st International Conference on Automated Deduction (Bremen, Germany), 492-507, Lecture Notes in Artificial Intelligence 4603, Springer-Verlag.

Baumgartner P., Tinelli C. (2003), The Model Evolution Calculus, Baader F., Proceedings of the 19th International Conference on Automated Deduction (Miami, USA), 350-364, Lecture Notes in Artificial Intelligence 2741, Springer-Verlag.

Baumgartner P., Tinelli C. (2005), The Model Evolution Calculus with Equality, Nieuwenhuis R., Proceedings of the 20th International Conference on Automated Deduction (Tallinn, Estonia), 392-408, Lecture Notes in Artificial Intelligence 3632, Springer-Verlag.

Benzmüller C., Kohlhase M. (1998), LEO - A Higher-Order Theorem Prover, Kirchner C., Kirchner H., Proceedings of the 15th International Conference on Automated Deduction (Lindau, Germany), 139-143, Lecture Notes in Artificial Intelligence 1421, Springer-Verlag.

Benzmüller C., Paulson L., Theiss F., Fietzke A. (2008), LEO-II - A Cooperative Automatic Theorem Prover for Higher-Order Logic, Baumgartner P., Armando A., Gilles D., Proceedings of the 4th International Joint Conference on Automated Reasoning (Sydney, Australia), 162-170, Lecture Notes in Artificial Intelligence 5195, Springer-Verlag.

Benzmüller C., Rabe F., Sutcliffe G. (2008), THF0 - The Core TPTP Language for Classical Higher-Order Logic, Baumgartner P., Armando A., Gilles D., Proceedings of the 4th International Joint Conference on Automated Reasoning (Sydney, Australia), 491-506, Lecture Notes in Artificial Intelligence 5195, Springer-Verlag.

Benzmüller C. (1999), Extensional Higher-order Paramodulation and RUE-Resolution, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), 399-413, Lecture Notes in Artificial Intelligence 1632, Springer-Verlag.

Bibel W. (1987), Automated Theorem Proving, Vieweg and Sohn.

Bishop M., Andrews P.B. (1998), Selectively Instantiating Definitions, Kirchner C., Kirchner H., Proceedings of the 15th International Conference on Automated Deduction (Lindau, Germany), 365-380, Lecture Notes in Artificial Intelligence 1421, Springer-Verlag.

Bishop M. (1999), A Breadth-First Strategy for Mating Search, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), 359-373, Lecture Notes in Artificial Intelligence 1632, Springer-Verlag.

Bledsoe W.W. (1971), Splitting and Reduction Heuristics in Automatic Theorem Proving, Artificial Intelligence 2, 55-77.

Bonichon R., Delahaye D., Doligez D. (2007), Zenon : An Extensible Automated Theorem Prover Producing Checkable Proofs, Dershowitz N., Voronkov A., Proceedings of the 14th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (Yerevan, Armenia), 151-165, Lecture Notes in Artificial Intelligence 4790.

Brown C.E., Smolka G. (2009), Terminating Tableaux for the Basic Fragment of Simple Type Theory, Giese M., Waaler A., Proceedings of the 18th International Conference on Automated Reasoning with Analytic Tableaux and Related Methods (Oslo, Norway), 138-151, Lecture Notes in Artificial Intelligence 5697, Springer-Verlag.

Brown C.E., Smolka G. (2009), Extended First-Order Logic, Nipkow T., Urban C., Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics (Munich, Germany), 164-179, Lecture Notes in Computer Science 5674, Springer-Verlag.

Brown C.E., Smolka G. (2010), Analytic Tableaux for Simple Type Theory and its First-Order Fragment, Logical Methods in Computer Science, To appear.

Brown C.E. (2002), Solving for Set Variables in Higher-Order Theorem Proving, Voronkov A., Proceedings of the 18th International Conference on Automated Deduction (Copenhagen, Denmark), 408-422, Lecture Notes in Artificial Intelligence 2392, Springer-Verlag.

Brown C.E. (2003), QEPCAD B - A Program for Computing with Semi-algebraic sets using CADs, ACM SIGSAM Bulletin 37(4), 97-108.

Brown C.E. (2007), Automated Reasoning in Higher-Order Logic: Set Comprehension and Extensionality in Church's Type Theory, Studies in Logic: Logic and Cognitive Systems 10, College Publications.

Böhme S. (2009), Proof Reconstruction for Z3 in Isabelle/HOL, Duterte B., Strichman O., Proceedings of the 7th International Workshop on Satisfiability Modulo Theories (Montreal, Canada).

Claessen K., Sörensson N. (2003), New Techniques that Improve MACE-style Finite Model Finding, Baumgartner P., Fermueller C., Proceedings of the CADE-19 Workshop: Model Computation - Principles, Algorithms, Applications (Miami, USA).

de Nivelle H., Meng J. (2006), Geometric Resolution: A Proof Procedure Based on Finite Model Search, Furbach U., Shankar N., Proceedings of the 3rd International Joint Conference on Automated Reasoning (Seattle, USA), 303-317, Lecture Notes in Artificial Intelligence 4130, Springer-Verlag.

de Nivelle H. (2007), Redundancy for Geometric Resolution, Ahrendt W., Baumgartner P., de Nivelle H., Proceedings of CADE-21 Workshop on DISPROVING - Non-Theorems, Non-Validity, Non-Provability (Bremen, Germany).

de Nivelle H. (2010), Classical Logic with Partial Functions, Giesl J., Haehnle R., Proceedings of the 5th International Joint Conference on Automated Reasoning (Edinburgh, United Kingdom), To appear, Lecture Notes in Artificial Intelligence.

Denman W., Akbarpour B., Tahar S., Zaki M., Paulson L. (2009), Formal Verification of Analog Designs using MetiTarski, Biere A., Pixley C., Proceedings of Formal Methods in Computer Aided Design 2009 (Austin, USA), 93-100, IEEE.

E{\'e}n N., Sörensson N. (2004), An Extensible SAT-solver, Giunchiglia E., Tacchella A., Proceedings of the 6th International Conference on Theory and Applications of Satisfiability Testing (Santa Margherita Ligure, Italy), 502-518, Lecture Notes in Computer Science 2919, Springer-Verlag.

Furbach U., Glöckner I., Pelzer B. (2010), An Application of Automated Reasoning in Natural Language Question Answering, AI Communications 23(2-3), 241-265, IOS Press.

Ganzinger H., Korovin K. (2003), New Directions in Instantiation-Based Theorem Proving, Kolaitis P., Proceedings of the 18th IEEE Symposium on Logic in Computer Science (Ottawa, Canada), 55-64, IEEE Press.

Ganzinger H., Korovin K. (2004), Integrating Equational Reasoning into Instantiation-Based Theorem Proving, Marcinkowski J., Tarlecki A., Proceedings of the 18th International Workshop on Computer Science Logic, 13th Annual Conference of the EACSL (Karpacz, Poland), 71-84, Lecture Notes in Computer Science 3210, Springer-Verlag.

Hillenbrand T., Jaeger A., Löchner B. (1999), Waldmeister - Improvements in Performance and Ease of Use, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), 232-236, Lecture Notes in Artificial Intelligence 1632, Springer-Verlag.

Hillenbrand T. (2003), Citius altius fortius: Lessons Learned from the Theorem Prover Waldmeister, Dahn I., Vigneron L., Proceedings of the 4th International Workshop on First-Order Theorem Proving (Valencia, Spain), 1-13, Electronic Notes in Theoretical Computer Science 86.1.

Hurd J. (2003), First-Order Proof Tactics in Higher-Order Logic Theorem Provers, Archer M., Di Vito B., Munoz C., Proceedings of the 1st International Workshop on Design and Application of Strategies/Tactics in Higher Order Logics (Rome, Italy), 56-68, NASA Technical Reports NASA/CP-2003-212448.

Issar S. (1990), Path-Focused Duplication: A Search Procedure for General Matings, Dietterich T. Swartout W., Proceedings of the 8th National Conference on Artificial Intelligence (Boston, USA), 221-226, American Association for Artificial Intelligence / MIT Press.

Korovin K., Sticksel C. (2010), iProver-Eq - An Instantiation-Based Theorem Prover with Equality, Giesl J., Haehnle R., Proceedings of the 5th International Joint Conference on Automated Reasoning (Edinburgh, United Kingdom), To appear, Lecture Notes in Artificial Intelligence.

Korovin K. (2008), iProver - An Instantiation-Based Theorem Prover for First-order Logic (System Description), Baumgartner P., Armando A., Gilles D., Proceedings of the 4th International Joint Conference on Automated Reasoning (Sydney, Australia), 292-298, Lecture Notes in Artificial Intelligence 5195.

Korovin K. (2009), An Invitation to Instantiation-Based Reasoning: From Theory to Practice, Podelski A., Voronkov A., Wilhelm R., Volume in Memoriam of Harald Ganzinger, 163-166, Lecture Notes in Computer Science 5663, Springer-Verlag.

Korp M., Sternagel C., Zankl H., Middledorp A. (2009), Tyrolean Termination Tool 2, Treinen R., Proceedings of the 20th International Conference on Rewriting Techniques and Applications (Brazilia, Brazil), 295-304, Lecture Notes in Computer Science 5595.

Kurihara M., Kondo H. (1999), Completion for Multiple Reduction Orderings, Journal of Automated Reasoning 23(1), 25-42.

Letz R., Stenz G. (2001), Model Elimination and Connection Tableau Procedures, Robinson A., Voronkov A., Handbook of Automated Reasoning, 2015-2114, Elsevier Science.

Loechner B. (2004), What to Know When Implementing LPO, Sutcliffe G., Schulz S., Tammet T., Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning (Cork, Ireland).

Loechner B. (2006), Things to Know When Implementing KBO, Journal of Automated Reasoning 36(4), 289-310.

Loveland D.W. (1978), Automated Theorem Proving : A Logical Basis, Elsevier Science.

McCune W.W., Wos L. (1997), Otter: The CADE-13 Competition Incarnations, Journal of Automated Reasoning 18(2), 211-220.

McCune W.W. (2003), Mace4 Reference Manual and Guide, ANL/MCS-TM-264, Argonne National Laboratory (Argonne, USA).

McCune W.W. (2003), Otter 3.3 Reference Manual, ANL/MSC-TM-263, Argonne National Laboratory (Argonne, USA).

Meng J., Paulson L. (2009), Lightweight Relevance Filtering for Machine-generated Resolution Problems, Journal of Applied Logic 7(1), 41-57.

Miller D. (1987), A Compact Representation of Proofs, Studia Logica 46(4), 347-370.

Nipkow T., Paulson L., Wenzel M. (2002), Isabelle/HOL: A Proof Assistant for Higher-Order Logic, Lecture Notes in Computer Science 2283, Springer-Verlag.

Nipkow T. (1989), Equational Reasoning in Isabelle, Science of Computer Programming 12(2), 123-149.

Otten J., Bibel W. (2003), leanCoP: Lean Connection-Based Theorem Proving, Journal of Symbolic Computation 36(1-2), 139-161.

Otten J., Sutcliffe G. (2010), Using the TPTP Language for Representing Derivations in Tableau and Connection Calculi, Konev B., Schmidt R., Schulz S., Proceedings of the Workshop on Practical Aspects of Automated Reasoning, 5th International Joint Conference on Automated Reasoning (Edinburgh, United Kingdom), To appear.

Otten J. (2008), {\sf leanCoP 2.0} and {\sf ileancop 1.2}: High Performance Lean Theorem Proving in Classical and Intuitionistic Logic, Baumgartner P., Armando A., Gilles D., Proceedings of the 4th International Joint Conference on Automated Reasoning (Sydney, Australia), 283-291, Lecture Notes in Artificial Intelligence 5195.

Otten J. (2010), Restricting Backtracking in Connection Calculi, AI Communications 23(2-3), 159-182.

Pastre D. (2001), Muscadet 2.3 : A Knowledge-based Theorem Prover based on Natural Deduction, Gore R., Leitsch A., Nipkow T., Proceedings of the International Joint Conference on Automated Reasoning (Siena, Italy), 685-689, Lecture Notes in Artificial Intelligence 2083, Springer-Verlag.

Pastre D. (2001), Implementation of Knowledge Bases for Natural Deduction, Nieuwenhuis R., Voronkov A., Proceedings of the 8th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (Havana, Cuba), 29-68, Lecture Notes in Artificial Intelligence 2250, Springer-Verlag.

Pastre D. (2002), Strong and Weak Points of the Muscadet Theorem Prover, AI Communications 15(2-3), 147-160.

Pastre D. (2007), Complementarity of a Natural Deduction Knowledge-based Prover and Resolution-based Provers in Automated Theorem Proving, http://www.math-info.univ-paris5.fr/~pastre/compl-NDKB-RB.pdf.

Pastre D. (1978), Automatic Theorem Proving in Set Theory, Artificial Intelligence 10, 1-27.

Pastre D. (1989), Muscadet : An Automatic Theorem Proving System using Knowledge and Metaknowledge in Mathematics, Artificial Intelligence 38, 257-318.

Pastre D. (1993), Automated Theorem Proving in Mathematics, Annals of Mathematics and Artificial Intelligence 8, 425-447.

Paulson L.C., Nipkow T. (1994), Isabelle: A Generic Theorem Prover, Lecture Notes in Computer Science 828, Springer-Verlag.

Paulson L. (1999), A Generic Tableau Prover and its Integration with Isabelle, Artificial Intelligence 5(3), 73-87.

Pelzer B., Wernhard C. (2007), System Description: E-KRHyper, Pfenning F., Proceedings of the 21st International Conference on Automated Deduction (Bremen, Germany), 508-513, Lecture Notes in Artificial Intelligence 4603, Springer-Verlag.

Pfenning F. (1987), Proof Transformations in Higher-Order Logic, PhD thesis, Carnegie-Mellon University (Pittsburg, USA).

Prevosto V., Waldmann U. (2006), SPASS+T, Sutcliffe G., Schmidt R., Schulz S., Proceedings of the FLoC'06 Workshop on Empirically Successful Computerized Reasoning, 3rd International Joint Conference on Automated Reasoning (Seattle, USA), 19-33, CEUR Workshop Proceedings 192.

Pugh W. (1992), The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis, Communications of the ACM 31(8), 4-13.

Riazanov A., Voronkov A. (2001), Vampire 1.1 (System Description), Gore R., Leitsch A., Nipkow T., Proceedings of the International Joint Conference on Automated Reasoning (Siena, Italy), 376-380, Lecture Notes in Artificial Intelligence 2083, Springer-Verlag.

Roederer A., Puzis Y., Sutcliffe G. (2009), Divvy: A ATP Meta-system based on Axiom Relevance Ordering, Schmidt R., Proceedings of the 22nd International Conference on Automated Deduction (Montreal Canada), 157-162, Lecture Notes in Artificial Intelligence 5663, Springer-Verlag.

Schulz S. (2002), E: A Brainiac Theorem Prover, AI Communications 15(2-3), 111-126.

Schulz S. (2004), System Abstract: E 0.81, Rusinowitch M., Basin D., Proceedings of the 2nd International Joint Conference on Automated Reasoning (Cork, Ireland), 223-228, Lecture Notes in Artificial Intelligence 3097.

Schulz S. (2004), Simple and Efficient Clause Subsumption with Feature Vector Indexing, Sutcliffe G., Schulz S., Tammet T., Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning (Cork, Ireland).

Suda M., Sutcliffe G., Wischnewski P., Lamotte-Schubert M., de Melo G. (2009), External Sources of Axioms in Automated Theorem Proving, Mertsching B., Proceedings of the 32nd Annual Conference on Artificial Intelligence (Paderborn, Germany), 281-288, Lecture Notes in Artificial Intelligence 5803.

Sutcliffe G., Benzmüller C. (2010), Automated Reasoning in Higher-Order Logic using the TPTP THF Infrastructure, Journal of Formalized Reasoning 3(1), 1-27.

Sutcliffe G., Suda M., Teyssandier A., Dellis N., de Melo G. (2010), Progress Towards Effective Automated Reasoning with World Knowledge, Murray C., Guesgen H., Proceedings of the 23rd International FLAIRS Conference (Daytona Beach, USA), 110-115, AAAI Press.

Theiss F., Benzmüller C. (2006), Term Indexing for the LEO-II Prover, Benzmüller C., Fischer B., Sutcliffe G., Proceedings of the 6th International Workshop on the Implementation of Logics (Phnom Penh, Cambodia), 7-23, CEUR Workshop Proceedings 212.

Ullman J. (1989), Principles of Database and Knowledge-Base Bystems, Computer Science Press, Inc..

Voronkov A' (2001), Algorithms, Datastructures, and Other Issues in Efficient Automated Deduction, Gore R., Leitsch A., Nipkow T., Proceedings of the International Joint Conference on Automated Reasoning (Siena, Italy), 13-28, Lecture Notes in Artificial Intelligence 2083, Springer-Verlag.

Walther C. (1983), A Many-Sorted Calculus Based on Resolution and Paramodulation, Bundy A., Proceedings of the 8th International Joint Conference on Artificial Intelligence (Karlsruhe, Germany), 882-891.

Weidenbach C., Fietzke A., Kumar R., Suda M., Wischnewski P., Dimova D. (2009), SPASS Version 3.5, Schmidt R., Proceedings of the 22nd International Conference on Automated Deduction (Montreal Canada), 140-145, Lecture Notes in Artificial Intelligence 5663, Springer-Verlag.

Wernhard C. (2003), System Description: KRHyper, Fachberichte Informatik 14--2003, Universität Koblenz-Landau (Koblenz, Germany).

Winkler S., Middledorp A. (2010), Termination Tools in Ordered Completion, Giesl J., Haehnle R., Proceedings of the 5th International Joint Conference on Automated Reasoning (Edinburgh, United Kingdom), To appear, Lecture Notes in Artificial Intelligence.

Winkler S., Sato H., Middledorp A., Kurihara M. (2010), Optimizing mkbTT, Lynch C., Proceedings of the 21st International Conference on Rewriting Techniques and Applications (Edinburgh, United Kingdom), To appear, Leibniz International Proceedings in Informatics.

Zimmer S. (2007), Intelligent Combination of a First Order Theorem Prover and SMT Procedures, MSc thesis, Saarland University (Saarbrücken, Germany).