The MPTP $100 Challenges

This Challenge is Completed


The MPTP $100 Challenges are sets of classical first-order reasoning problems, expressed in the TPTP language, to be proved by fully automated reasoning systems, within specified reasonable resource constraints. The challenge problems are based on the Mizar Mathematical Library. Details of the challenge problems, and the system execution environment, are described below.

The goal of the MPTP challenges is to boost the development of Automated Theorem Proving (ATP) and Artificial Intelligence (AI) methods and tools for reasoning in large theories that involve large numbers of consistently used concepts. Such tools are increasingly needed for proof assistance over large formal mathematical knowledge bases, and also more generally for reasoning over any sufficiently semantically specified "real-world" knowledge bases.

The winners of the MPTP challenges will be announced at CADE-21, and will each receive $100 in real US dollars ... who says there's no money in ATP!

The Mizar Mathematical Library

Mizar is a formal language derived from the mathematical vernacular. It is characterized by the intuitive Jaskowski-style of natural deduction. It is also a proof checker for this language, and a long-term project focused on formalization of mathematics. There are two important features that distinguish Mizar from other proof assistants: Each article in the MML typically contains several theorems, along with some set-theoretical definitions of various mathematical notions. The theorems are proved by natural deduction, from axioms (including definitions) of set theory, and using previously proved theorems as lemmas. The dependencies between theorems, in the sense of using previously proved theorems as lemmas the proofs of subsequent theorems, form a directed acyclic graph (DAG). The leaves of the DAG are the axioms of set theory (including definitions), and the internal and root nodes are Mizar theorems. Edges run from the axioms and lemmas used in each proof, to the the proved theorem. The DAG has a cannonical topological sorting (i.e. a DAG-compatible linear order) called the MML order. This linear order is given mainly by the chronological order of article submissions to the MML.

The MPTP project makes Mizar theorems available as first order ATP problems, by translating the axioms, lemmas, and conjecture to plain first order logic using relativization of types and de-anonymization of abstract terms. Additionally, necessary background information that is known to the Mizar checker, e.g., about the type hierarchies used, is included as axioms. The resultant ATP problems are exported in the TPTP language.

The Challenges

A sub-DAG of the Mizar theorem graph, as shown in the heading above, has been selected as the basis for the challenge (click on the image for a larger version). The root of this DAG is the general topological theorem relating compactness in terms of open covers and in terms of nets (generalization of sequences). It says that every net on a compact topological space has a cluster point. A converse implication holds too, i.e., compact spaces are exactly those with every net having cluster points. This equivalence is the topological generalization of the Bolzano-Weierstrass theorem. Only one implication was chosen for the MPTP challenge, in order to make the size of the DAG reasonable. The DAG forms a bush of theorems, in which each branch from an leaf to the root forms a chain of theorems that Mizar uses as lemmas in proofs of subsequent theorems in the chain. For the MPTP challenge, each internal and root node of the DAG is used to form two ATP problems, as described below. You can view the Mizar presentation of just the MPTP challenge problems, in topdown (reverse MML) order, with access to the Mizar justifications.

The challenge is divided into two divisions: the bushy division, and the chainy division.

Here are the challenge problems in TPTP format (this distribution includes some bugfixes, and supercedes the original.)

Systems are entered into the challenge by submitting the source code, with installation instructions, to the organizers by the CASC system installation deadline (just to make Geoff's life easier :-). Systems may be pretuned for the challenge, but not to the extent of recording specific information about individual problems. Systems may retune themselves in any way during execution, including recording specific information about encountered problems. The source code will be publically available from this WWW site, providing a research resource and public scrutiny to discourage any cheating.

Execution of a system in each division is governed by an overall CPU time limit of 300s times the number of problems. Systems may accept, as a command line parameter, the name of a file that lists the names of all the problem files (one problem file name per line), and then schedule the use of the overall time limit internally. Alternatively, the organizers can run the system independently on each problem with a 300s limit - the same command line is used for all problems (in this mode, if a problem is solved in less than 300s the remaining time is forfeit, i.e., it's smarter to use the overall limit mode).

The winner of each division is the system that solves the most problems. In the event of a tie, the system with the lowest average CPU time used over the problems solved is the winner.

The MPTP challenge is organized by Josef Urban and Geoff Sutcliffe. Questions should be emailed to the organizers after you've checked the FAQ for the answer.

Results Accumulated - Mon Jul 16 12:18:43 EDT 2007

Interactive SVG presentation
(Thanks to Steven Trac)
Mouse over nodes to see which systems have solved the corresponding problem. Mouse over and click the progress bars for useful effects.
Raw Bushy and Chainy data
(So you can grep and sort and ...)