Entrants' System Descriptions
CVC4 1.5.2
Andrew Reynolds
University of Iowa, USA
Architecture
CVC4
[BC+11] is an SMT solver based on the DPLL(T) architecture
[NOT06] that includes built-in support for many theories, including linear
arithmetic, arrays, bit vectors, datatypes, finite sets and strings.
It incorporates approaches for handling universally quantified formulas.
For problems involving free function and predicate symbols, CVC4 primarily
uses heuristic approaches based on E-matching for theorems, and finite model
finding approaches for non-theorems.
For problems in pure arithmetic, CVC4 uses techniques for counterexample-guided
quantifier instantiation
[RD+15].
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh Boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT solver
to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable", or
return unknown.
Finite model finding in CVC4 targets problems containing background theories
whose quantification is limited to finite and uninterpreted sorts.
In finite model finding mode, CVC4 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in
[RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
It then adds instances of quantified formulas that are in conflict with the
candidate model, as described in
[RT+13].
If no instances are added, it reports "satisfiable".
Strategies
For handling theorems, CVC4 primarily uses conflict-based quantifier
instantiation
[RTd14] and E-matching.
CVC4 uses a handful of orthogonal trigger selection strategies for E-matching.
For handling non-theorems, CVC4 primarily uses finite model finding techniques.
Since CVC4 with finite model finding is also capable of establishing
unsatisfiability, it is used as a strategy for theorems as well.
For problems in pure arithmetic, CVC4 uses variations of counterexample-guided
quantifier instantiation
[RD+15], which select relevant quantifier instantiations based on models for
counterexamples to quantified formulas.
CVC4 relies on this method both for theorems in TFA and non-theorems in TFN.
Implementation
CVC4 is implemented in C++. The code is available from:
https://github.com/CVC4
Expected Competition Performance
For TFA, CVC4 should perform better than last year due to its use of new
heuristic techniques for non-linear real and integer arithmetic
[RT+17].
For FOF, it should perform slightly better due to improvements in the
implementation of E-matching and several optimizations related to
conflict-based instantiation
[BFR17].
It should perform roughly the same in the FNT division as last year.
E 2.1
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.1
[Sch02,Sch13] is a purely equational theorem prover for many-sorted
first-order logic with equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a
number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm,
i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
For the SLH and LTB divisions, a control program uses a SInE-like
analysis to extract reduced axiomatizations that are handed to several
instances of E.
E will probably not use on-the-fly learning this year.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a
clause selection heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies.
Clause selection heuristics can be constructed on the fly by combining various
parameterized primitive evaluation functions, or can be selected from a set
of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other
clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
For CASC-26, E implements a strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes,
based on a number of simple features (number of clauses, maximal symbol arity,
presence of equality, presence of non-unit and non-Horn clauses,...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves
the most problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 220 different strategies have been evaluated on all untyped
first-order problems from TPTP 6.4.0.
About 90 of these strategies are used in the automatic mode, and about 210
are used in at least one schedule.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a
rewrite link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing
[Sch04].
Superposition and backward rewriting use fingerprint indexing
[Sch12], a new technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient algorithms
developed by Bernd Löchner in
[Loe06,Loe06].
The prover and additional information are available at
http://www.eprover.org
Expected Competition Performance
E 2.1 has slightly better strategies than previous versions, and has some
minor improvements in clausification and Set-of-Support implementation.
The system is expected to perform well in most proof classes, but will at
best complement top systems in the disproof classes.
ET 0.2
Josef Urban
Czech Technical University in Prague, Czech Republic
Architecture
ET [KS+15] 0.2 is a metasystem using E prover with specific strategies
[Urb13,KU13,JU17] and preprocessing tools
[KU13,KU13,KU13] that are targeted mainly at problems with many
redundant axioms.
Its design is motivated by the recent experiments in the Large-Theory Batch
division
[KUV15] and on the Flyspeck, Mizar and Isabelle datasets, however, ET does no
learning from related proofs.
Strategies
We characterize formulas by the symbols and terms that they contain,
normalized in various ways.
Then we run various algorithms that try to remove the redundant axioms and
use special strategies on such problems.
Implementation
The metasystem is implemented in ca. 1000 lines of Perl. It uses a
number of external programs, some of them based on E's code base, some
of them independently implemented in C++.
Expected Competition Performance
ET can solve some problems that E 1.8 cannot prove, and even some TPTP
problems with rating 1.00.
The CASC performance should not be much worse than that of E, possibly better,
depending on problem selection.
iProver 2.5
Kontantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen
[GK03,
Kor13]
which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it
uses MiniSat
[ES04]
and optionally PicoSAT
[Bie08]
(only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution; see
[Kor08,
Kor13]
for the implementation details.
The proof search is implemented using a saturation process based on
the given clause algorithm. iProver uses non-perfect discrimination trees
for the unification indexes, priority queues for passive clauses, and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints
[GK04,
Kor08];
global subsumption
[Kor08];
resolution-based simplifications and propositional-based simplifications.
A compressed feature vector index is used for efficient forward/backward
subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality.
Recent changes in iProver include improved preprocessing and incremental
finite model finding; support of the AIG format for hardware verification and
model-checking (implemented with Dmitry Tsarkov).
In the LTB division, iProver uses axiom selection based on the Sine algorithm
[HV11]
as implemented in Vampire
[KV13],
i.e., axiom selection is done by Vampire and proof attempts are done by
iProver.
Some of iProver features are summarised below.
- proof extraction for both instantiation and resolution
[KS12],
- model representation, using first-order definitions in term algebra
[KS12],
- answer substitutions,
- semantic filtering,
- incremental finite model finding,
- sort inference, monotonic
[CLS11]
and non-cyclic
[Kor13] sorts,
- predicate elimination
[KK16].
Sort inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging due to simplifications such global subsumption
which involve global reasoning with the whole clause set and can be
computationally expensive.
Strategies
iProver has around 60 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT solver, simplifications and options for combination of instantiation
with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
For the LTB and FNT divisions several strategies are run in parallel.
Implementation
Prover is implemented in OCaml and for the ground reasoning uses MiniSat
[ES04].
iProver accepts FOF and CNF formats.
Vampire
[KV13,
HK+12]
and E prover
[Sch13]
are used for proof-producing clausification of FOF problems,
Vampire is also used for axiom selection
[HV11]
in the LTB division.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 2.5 is the CASC-J8 THF division winner.
iProver 2.6
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen
[GK03,Kor13] which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it uses
MiniSat
[ES04] and optionally PicoSAT
[Bie08] (only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution; see
[Kor08,Kor13] for the implementation details.
The proof search is implemented using a saturation process based on the given
clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints
[GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
A compressed feature vector index is used for efficient forward/backward
subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality.
Recent changes in iProver include improved preprocessing and incremental
finite model finding; support for the TFF format restricted to clauses; the
AIG format for hardware verification and QBF reasoning.
In the LTB and SLH divisions, iProver combines an abstraction-refinement
framework
[HK17] with axiom selection based on the SinE algorithm
[HV11] as implemented in Vampire
[KV13], i.e., axiom selection is done by Vampire and proof attempts are done by
iProver.
Some of iProver features are summarised below.
- proof extraction for both instantiation and resolution
[KS12],
- model representation, using first-order definitions in term algebra
[KS12],
- answer substitutions,
- semantic filtering,
- incremental finite model finding,
- sort inference, monotonic
[CLS11] and non-cyclic
[Kor13] sorts,
- support for the TFF format restricted to clauses,
- predicate elimination
[KK16].
Sort inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging due to simplifications such global subsumption
which involve global reasoning with the whole clause set and can be
computationally expensive.
Strategies
iProver has around 60 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT solver, simplifications and options for combination of instantiation with
resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
For the LTB, SLH and FNT divisions several strategies are run in parallel.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat
[ES04].
iProver accepts FOF, TFF and CNF formats.
Vampire
[KV13,HK+12] and E prover
[Sch13] are used for proof-producing clausification of FOF/TFF problems,
Vampire is also used for axiom selection [HV11] in the LTB/SLH divisions.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
Compared to the last year, we integrated an abstraction-refinement framework
[HK17] which we expect to improve performance in the LTB and SLH divisions.
There are a several general improvements that should positively affect overall
performance.
iProverModulo 2.5-0.1
Guillaume Burel
ENSIIE, University Paris‑Saclay, France
Architecture
iProverModulo
[Bur11] is an extension of iProver
[Kor08] to integrate Polarized resolution modulo
[Dow10].
Polarized resolution modulo consists in presenting the theory in which the
problem has to be solved by means of polarized rewriting rules instead of
axioms.
It can also be seen as a combination of the set-of-support strategy and
selection of literals.
iProverModulo consists of two tools: First, autotheo is a theory preprocessor
that converts the axioms of the input into rewriting rules that can be used
by Polarized resolution modulo.
Second, these rewriting rules are handled by a patched version of iProver 2.5
that integrates Polarized resolution modulo.
The integration of polarized resolution modulo in iProver only affects its
ordered resolution calculus, so that the instantiation calculus is untouched.
iProverModulo 2.5+0.1 outputs a proof that is made of two parts:
First, autotheo prints a derivation of the transformation of the axioms
into rewriting rules. This derivation is in TSTP format and includes
the CNF conversions obtained from E.
Second, the modified version of iProver outputs a proof in TSTP format from
this set of rewriting rules and the other input formulas.
Strategies
Autotheo is first run to transform the formulas of the problem whose role is
"axiom" into polarized rewriting rules.
Autotheo offers a set of strategies to that purpose.
For the competition, the Equiv and the ClausalAll strategies will be used.
The former strategy orients formulas intuitively depending of their shape.
It may be incomplete, so that the prover may give up in certain cases.
However, it shows interesting results on some problems.
The second strategy should be complete, at least when equality is not
involved.
The rewriting system for the first strategy is tried for half the time
given for the problem, then the prover is restarted with the second strategy
if no proof has been found.
The patched version of iProver is run on the remaining formulas modulo the
rewriting rules produced by autotheo.
No scheduling is performed.
To be compatible with Polarized resolution modulo, literals are selected only
when they are maximal w.r.t. a KBO ordering, and orphans are not eliminated.
To take advantage of Polarized resolution modulo, the resolution calculus is
triggered more often than the instantiation calculus, on the contrary to the
original iProver.
Normalization of clauses w.r.t. the term rewriting system produced by autotheo
is performed by transforming these rules into an OCaml program, compiling this
program, and dynamically linking it with the prover.
Implementation
iProverModulo is available as a patch to iProver.
The most important additions are the plugin-based normalization engine and
the handling of polarized rewriting rules.
iProverModulo is available from
http://www.ensiie.fr/~guillaume.burel/blackandwhite_iProverModulo.html.en
Since iProverModulo needs to compile rewriting rules, an OCaml compiler is
also provided.
Autotheo is available independently from iProverModulo from
http://www.ensiie.fr/~guillaume.burel/blackandwhite_autotheo.html.en
Autotheo uses E to compute clausal normal form of formula.
The version of E it uses is very slightly modified to make it print the CNF
derivation even if no proof is found.
Both of autotheo and iProver are written in OCaml.
For the SLD division, iProverModulo uses the CNF transformation tool provided
with the Logtk library
[Cru14].
Expected Competition Performance
Although iProverModulo is now based on version 2.5 of iProver, no great
improvement of performance is expected compared to CASC-25, since only the
resolution part of iProver, which is relatively stable, has been modified.
Isabelle 2016
Jasmin Blanchette
Vrije Universiteit Amsterdam, Netherlands
Architecture
Isabelle/HOL 2016
[NPW02]
is the higher-order logic incarnation of the generic proof assistant
Isabelle2016.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner
[Nip89],
a classical reasoner
[PN94],
and a tableau prover
[Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer
[PB10,BBP11].
Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due
to Nik Sultana. It also includes TPTP versions of its popular tools, invokable
on the command line as isabelle tptp_tool max_secs
file.p. For example:
isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p
Isabelle is available in two versions.
The HOT version (which is not participating in CASC-J8) includes LEO-II
[BP+08]
and Satallax
[Bro12]
as Sledgehammer backends, whereas the competition version leaves them out.
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
- sledgehammer
- Invokes the following sequence of provers as oracles via Sledgehammer:
- satallax - Satallax 2.7
[Bro12] (HOT version only);
- leo2 - LEO-II 1.6.2
[BPTF08] (HOT version only);
- spass - SPASS 3.8ds
[BP+12];
- vampire - Vampire 3.0 (revision 1435)
[RV02];
- e - E 1.8
[Sch04];
- nitpick
- For problems involving only the type $o of Booleans, checks
whether a finite model exists using Nitpick
[BN10].
- simp
- Performs equational reasoning using rewrite rules
[Nip89].
- blast
- Searches for a proof using a fast untyped tableau prover and then
attempts to reconstruct the proof using Isabelle tactics
[Pau99].
- auto+spass
- Combines simplification and classical reasoning
[PN94]
under one roof; then invoke Sledgehammer with SPASS on any subgoals that
emerge.
- z3
- Invokes the SMT solver Z3 4.4.0
[dMB08].
- cvc4
- Invokes the SMT solver CVC4 1.5pre
[BT07].
- fast
- Searches for a proof using sequent-style reasoning, performing a
depth-first search
[PN94].
Unlike blast, it construct proofs directly in Isabelle.
That makes it slower but enables it to work in the presence of the more
unusual features of HOL, such as type classes and function unknowns.
- best
- Similar to fast, except that it performs a best-first search.
- force
- Similar to auto, but more exhaustive.
- meson
- Implements Loveland's MESON procedure
[Lov78].
Constructs proofs directly in Isabelle.
- fastforce
- Combines fast and force.
Implementation
Isabelle is a generic theorem prover written in Standard ML. Its meta-logic,
Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The
HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers. Other object
logics are available, notably FOL (first-order logic) and ZF
(Zermelo–Fraenkel set theory).
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype. Assuming the kernel is correct, all values of type theorem
are correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
I expect we will end up in second place (excluding proof output), behind
Satallax, since we haven't upgraded the system since 2016. We will be back,
hopefully in 2018!
LEO-II 1.7.0
Alexander Steen
Freie Universität Berlin, Germany
Architecture
LEO-II
[BP+08],
the successor of LEO
[BK98],
is a higher-order ATP system based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution
[Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E
[Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language
[BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure
[SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08]
as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
http://www.leoprover.org
Expected Competition Performance
LEO-II ist not actively being developed anymore, hence there are no
expected improvements to last year's CASC results.
Leo-III 1.1
Alexander Steen
Freie Universität Berlin, Germany
Architecture
Leo-III
[SWB16], the successor of LEO-II
[BP+08],
is a higher-order ATP system based on higher-order paramodulation with inference
restrictions using a higher-order term ordering.
Since Leo-III employs a agent-based blackboard architecture, multiple
independent proof search approaches can be run in parallel as so-called
agents.
In version 1.1, each agent runs a sequential proof search based on the
given-clause algorithm as known from E, each with different search strategy.
Leo-III heavily relies on cooperation with external (first-order) ATPs that
are called asynchronously during proof search.
At the moment, first-order cooperation is limited to typed first-order
systems, where CVC4
[BC+11] is used as default external system.
Nevertheless, further external systems (also further higher-order systems or
counter model generators) can be employed using command-line arguments.
If either one of the saturation procedure loops or one of the external provers
finds a proof, the system stops and returns the result.
Strategies
Leo-III runs multiple search strategies in parallel using its agent-based
architecture.
The search strategies differ in the employed relevance filter parameters,
inference parameters, pre-processing techniques and hence the considered
formula set.
The available portfolio of strategies also contains incomplete approaches
that might outperform default search strategies for some problem input classes.
Implementation
Leo-III exemplarily utilizes and instantiates the associated LeoPARD system
platform
[WSB15] for higher-order (HO) deduction systems implemented in
Scala (currently using Scala 2.12).
The prover makes use of LeoPARD’s sophisticated data structures and implements
its own reasoning logic on top, e.g. as agents in LeoPARD’s provided
blackboard architecture
[BSW17].
A generic parser is provided that supports all TPTP syntax dialects.
It is implemented using ANTLR4 and converts its produced concrete syntax
tree to an internal TPTP AST data structure which is then transformed into
polymorphically typed lambda terms.
As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF,
TFF, THF) including their polymorphic variants
[BP13,KRS16].
The term data structure of Leo-III uses a spine term representation
augmented with explicit substitutions and De Bruijn-indices.
Furthermore, terms are perfectly shared during proof search, permitting
constant-time equality checks between alpha-equivalent terms.
As pointed out before, Leo-III’s agents may at any point invoke external
reasoning tools.
To that end, Leo-III includes an encoding module that translates (polymorphic)
higher-order clauses to polymorphic and monomorphic typed first-order clauses.
While LEO-II relied on cooperation with untyped first-order provers, we hope
to reduce clutter and therefore achieve better results using native type
support in first-order provers.
Leo-III 1.1 will be available on GitHub after CASC-26:
https://github.com/cbenzmueller/Leo-III
Expected Competition Performance
In contrast to its last version 1.0 (competed at CASC-J8), Leo-III 1.1 has
been improved in several aspects.
Due to the novel cooperation schemes with typed first-order provers, we
strongly expect better results compared to its predecessor LEO-II.
MaLARea 0.6
Josef Urban
Czech Technical University in Prague, Czech Republic
Architecture
MaLARea 0.6
[Urb07,US+08,KUV15] is a metasystem for ATP in large theories where symbol and
formula names are used consistently.
It uses several deductive systems (now E,SPASS,Vampire, Paradox, Mace), as
well as complementary AI techniques like machine learning (the SNoW system)
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
The version for CASC-26 will mainly use the E prover with the BliStr(Tune)
[Urb13,JU17] large-theory strategies, possibly also Prover9, Mace and Paradox.
The premise selection methods will likely also use the distance-weighted
k-nearest neighbor
[KU13] and E's implementation of SInE.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use
the most relevant axioms for next ATP attempts.
This is iterated, using different timelimits and axiom limits.
Various features are used for learning, and the learning is complemented by
other criteria like model-based reasoning, symbol and term-based similarity,
etc.
Implementation
The metasystem is implemented in ca. 2500 lines of Perl.
It uses many external programs - the above mentioned ATPs and machine learner,
TPTP utilities, LADR utilities for work with models, and some standard
Unix tools.
MaLARea is available at:
https://github.com/JUrban/MPTP2/tree/master/MaLARea
The metasystem's Perl code is released under GPL2.
Expected Competition Performance
Thanks to machine learning, MaLARea is strongest on batches of many
related problems with many redundant axioms where some of the problems
are easy to solve and can be used for learning the axiom relevance.
MaLARea is not very good when all problems are too difficult (nothing to
learn from), or the problems (are few and) have nothing in common.
Some of its techniques (selection by symbol and term-based similarity,
model-based reasoning) could however make it even there slightly stronger
than standard ATPs.
MaLARea has a very good performance on the MPTP Challenge, which is a
predecessor of the LTB division, and on several previous LTB competitions.
Princess 170717
Philipp Rümmer
Uppsala University, Sweden
Architecture
Princess
[Rue08,Rue12] is a theorem prover for first-order logic
modulo linear integer arithmetic.
The prover uses a combination of techniques from the areas of first-order
reasoning and SMT solving.
The main underlying calculus is a free-variable tableau calculus,
which is extended with constraints to enable backtracking-free proof
expansion, and positive unit hyper-resolution for lightweight
instantiation of quantified formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for a
number of further fragments.
In addition, some built-in procedures for nonlinear integer arithmetic are
available.
The internal calculus of Princess only supports uninterpreted predicates;
uninterpreted functions are encoded as predicates, together with the usual
axioms.
Through appropriate translation of quantified formulae with functions, the
e-matching technique common in SMT solvers can be simulated; triggers in
quantified formulae are chosen based on heuristics similar to those in the
Simplify prover.
Strategies
For CASC, Princess will run a fixed schedule of configurations for each
problem (portfolio method).
Configurations determine, among others, the mode of proof expansion
(depth-first, breadth-first), selection of triggers in quantified formulae,
clausification, and the handling of functions.
The portfolio was chosen based on training with a random sample of problems
from the TPTP library.
Implementation
Princess is entirely written in Scala and runs on any recent Java virtual
machine; besides the standard Scala and Java libraries, only the Cup parser
library is used.
Princess is available from:
http://www.philipp.ruemmer.org/princess.shtml
Expected Competition Performance
Princess should perform roughly as in the last years.
Compared to last year, the support for outputting proofs was extended, and
should now cover all relevant theory modules for CASC (but not yet all proof
strategies).
Prover9 2009-11A
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Strategies
Prover9 has available many strategies; the following
statements apply to CASC.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the
inference rules, with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP
[McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing
[Sch04]
is used for forward and backward nonunit subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Prover9 is the CASC fixed point, against which progress can be judged.
Each year it is expected do worse than the previous year, relative to the
other systems.
Satallax 3.0
Michael Färber
Universität Innsbruck, Austria
Architecture
Satallax 3.0
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
Proof search:
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat to test the current set of propositional
clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Optionally, Satallax generates first-order formulae in addition to the
propositional clauses.
If this option is used, then Satallax periodically calls the first-order
theorem prover E to test for first-order unsatisfiability.
If the set of first-order formulae is unsatisfiable, then the original branch
is unsatisfiable.
Upon request, Satallax attempts to reconstruct a proof which can be output
in the TSTP format.
Strategies
There are about 140 flags that control the order in which formulae and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 3.0 has a strategy schedule consisting of 54 modes (15 of which make
use of E).
Each mode is tried for time limits ranging from less than a second to about
90 seconds.
The strategy schedule was determined through experimentation using the THF
problems in version 6.3.0 of the TPTP library.
Implementation
Satallax is implemented in OCaml.
A foreign function interface is used to interact with MiniSat 2.2.0
Satallax is available at:
http://satallaxprover.com
Expected Competition Performance
Satallax 3.0 is the CASC-J8 THF division winner.
Satallax 3.2
Michael Färber
Universität Innsbruck, Austria
Architecture
Satallax 3.2
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
Proof search:
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat to test the current set of propositional
clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Optionally, Satallax generates first-order formulae in addition to the
propositional clauses.
If this option is used, then Satallax periodically calls the first-order
theorem prover E
[Sch13] to test for first-order unsatisfiability.
If the set of first-order formulae is unsatisfiable, then the original branch
is unsatisfiable.
Upon request, Satallax attempts to reconstruct a proof which can be output
in the TSTP format.
The proof reconstruction has been significantly changed since Satallax 3.0
in order to make proof reconstruction more efficient and thus less likely to
fail within the time constraints.
Strategies
There are about 150 flags that control the order in which formulae and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Before deciding on the schedule to use, Satallax parses the problem and
determines if it is big enough that a SInE-based premise selection algorithm
[HV11] should be used.
If SInE is not activated, then Satallax 3.2 uses a strategy schedule
consisting of 37 modes.
Each mode is tried for time limits ranging from less than a second to just
over 1 minute.
If SInE is activated, than Satallax is run with a SInE-specific schedule
consisting of 20 possible SInE parameter values selecting different premises
and some corresponding modes and time limits.
Implementation
Satallax is implemented in OCaml, making use of the external tools MiniSat
(via a foreign function interface) and E.
Satallax is available at:
http://satallaxprover.com
Expected Competition Performance
The addition of a SInE-like procedure for premise selection means Satallax
should be able to solve some large problems that were previously out of reach.
In addition, the changes to the way TSTP proofs are generated should mean that
proofs are more likely to be constructed and reported after a proof has been
found.
We hope that this will be reflected in an improved performance over
Satallax 3.0 from last year.
Scavenger EP-0.1 and EP-0.2
Bruno Woltzenlogel Paleo
Australian National University, Australia
Architecture
Scavenger
[ISW17] is a theorem prover based on the new Conflict Resolution calculus
[SW17].
At the proof-theoretical level, Conflict Resolution (CR) is a generalization
of the conflict-driven clause learning (CDCL) principle to first-order logic.
CR derivations are isomorphic to implication graphs (as maintained by
SAT-solvers): every unit-propagating resolution inference corresponds to a
new propagated literal in the graph; every assumption/decision corresponds to
a decision literal in the graph; and every conflict inference corresponds to
a conflict in the graph.
CR's clause learning inference learns a disjunction of negations of
instances of the decision literals that are ancestors of the conflict,
using the compositions of the unifiers on the paths from the decisions to the
conflict.
From a natural deduction point of view, CR's clause learning inference rule
generalizes implication/negation introduction by taking unification into
account and considering several assumptions at once
[Wol16].
In this sense, it does to Gentzen's implication/negation introduction what
Robinson's resolution did to implication elimination (a.k.a. modus ponens).
The architecture of Scavenger attempts to be similar to the architecture of
SAT-solvers, but data structures typically used in sat-solvers (e.g.,
Two-Watched-Literals) cannot be easily and efficiently generalized to
first-order logic.
Because of that, Scavenger's architecture also has a "taste" of saturation.
For example, whereas in a SAT-solver propagation causes a literal to be
assigned (either true or false), in Scavenger, propagation often requires
generation of an instance of a literal, and it is this generated instance
that is assigned.
Strategies
Proof search in the Conflict Resolution calculus presents unique challenges.
For example, in contrast to what happens in the propositional case,
unit-propagation may not terminate. Scavenger is an experimental prover, and
such challenges have been dealt with in various ways
[ISW17].
Scavenger-EP-0.1 is one of the three versions evaluated in
[ISW17].
It simply ignores the non-termination of unit-propagation (and hence is
incomplete).
(Scavenger-TD-0.1 and Scavenger-PD-0.1 maintain completeness by iteratively
deepening the propagation and making decisions eagerly.
However, on TPTP problems they did not perform as well as Scavenger-EP-0.1,
and therefore will not participate in CASC this year.)
Scavenger-EP-0.2 extends Scavenger-EP-0.1 from CNF without equality to FOF
with equality.
However, equality reasoning is done in a naive way: (instances of) equality
axioms are added to the problem when needed.
Scavenger-EP-0.2 also implements a VSIDS heuristic for decision literal
selection and optimizes unification and some data structures.
Implementation
Scavenger is implemented in Scala and runs on the Java Virtual Machine.
Terms and formulas are simply typed lambda terms.
Clauses are two-sided sequents (pairs of lists of positive and negative atomic
formulas).
Inference rules are classes with assertions that ensure their soundness.
A hashmap is used to allow quicker detection of propagating clauses (in an
attempt to generalize two-watched-literals to first-order logic).
Scavenger is available at:
https://gitlab.com/aossie/Scavenger/
Expected Competition Performance
Both competing versions of Scavenger are expected to perform better on
effectively propositional problems (where the non-termination of
unit-propagation is not an issue) than on problems that are not in this
fragment.
Scavenger-EP-0.1 has been evaluated in
[ISW17], and similar performance is expected in CASC.
Scavenger-EP-0.2 has not been thoroughly evaluated yet.
It is hoped that it will perform better than Scavenger-EP-0.1.
Vampire 4.0
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire 4.0 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder.
Splitting in resolution-based proof search is controlled by the AVATAR
architecture, which uses a SAT solver to make splitting decisions.
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are used
for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
Sine axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.0 provides a very large number of options for strategy selection. The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Evaluation of interpreted functions.
- Extensionality resolution with detection of extensionality axioms
Implementation
Vampire 4.0 is implemented in C++.
Expected Competition Performance
Vampire 4.0 is the CASC-J8 FOF and LTB division winner.
Vampire 4.1
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13]
4.1 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus
[Kor13]
and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture
[Vor14]
which uses a SAT or SMT solver to make splitting decisions.
Both resolution and instantiation based proof search make use of global
subsumption
[Kor13].
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.1 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03].
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are
preferred for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3
[dMB08]
with AVATAR to restrict search to ground-theory-consistent splitting
branches.
- Extensionality resolution
[GK+14]
with detection of extensionality axioms.
Implementation
Vampire 4.1 is implemented in C++.
Expected Competition Performance
Vampire 4.0 is the CASC-J8 TFA and FNT division winner.
Vampire 4.2
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13] 4.2 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,RB+16].
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.2 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[LRS]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of
comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent
splitting branches
[RB+16].
- Specialised theory instantiation and unification
- Extensionality resolution with detection of extensionality axioms
Implementation
Vampire 4.2 is implemented in C++. It makes use of minisat and z3.
Expected Competition Performance
Vampire 4.2 should be an improvement on Vampire 4.1, which won 4 divisions
last year.
Most changes have happened in parts relevant to TFA, some small changes in
parts relevant to model building (EPR and FNT), and some general improvements
to preprocessing that will effect all tracks.
Zipperposition 1.1
Simon Cruanes
Inria Nancy, France
Architecture
Zipperposition is a superposition-based theorem prover for typed
first-order logic with equality.
It features a number of extensions that include polymorphic types; linear
arithmetic on integers and rationals using a specialized set of first-order
inference rules; datatypes with structural induction; user-defined rewriting
on terms and formulas ("deduction modulo theories"); a lightweight variant
of AVATAR for boolean case splitting; first-class booleans
[KK+16]; and (very experimental) support for higher-order logic, extending
first-order rules to higher-order terms using a customized variant of pattern
unification.
The core architecture of the prover is based on saturation with an
extensible set of rules for inferences and simplifications.
The initial calculus and main loop were imitations of an old version of E
[Sch02], but there are many more rules nowadays.
A summary of the calculus for integer arithmetic and induction can be found
in [Cru15].
Strategies
The system does not feature advanced strategies: only one saturation
loop with pick-given ratio and clause selection heuristics is used.
No tuning specific to CASC was made.
Implementation
The prover is implemented in OCaml, and has been around for five years.
Term indexing is done using discrimination trees (non perfect ones for
unification, perfect ones for rewriting) as well as feature vectors
for subsumption.
Some inference rules such as contextual literal cutting make heavy use of
subsumption.
The code can be found at
https://github.com/c-cube/zipperposition
and is entirely free software (BSD-licensed).
Zipperposition can also output graphic proofs using graphviz.
Some tools to perform type inference and clausification for typed
formulas are also provided, as well as a separate library for dealing
with terms and formulas
[Cru15].
Expected Competition Performance
The prover is expected to have decent performance on first-order theorems,
hopefully beating prover9; decent performance in arithmetic (ignoring
the lack of real arithmetic), but behind more sophisticated provers
such as Vampire or CVC4; and poor performance on THF problems except
those that only require first-class booleans.