Introduction
The TPTP (Thousands of Problems for Theorem Provers) is a library of
problems for Automated Theorem Proving (ATP) systems.
The principal motivation for the TPTP project is to move the testing and
evaluation of ATP systems from the previously ad hoc situation onto a
firm footing.
This became necessary, as results being published do not always
accurately reflect the capabilities of the ATP system being considered.
A common library of problems is necessary for meaningful system evaluations,
meaningful system comparisons, repeatability of testing, and the production
of statistically significant results.
The TPTP is such a library.
The TPTP provides a simple, unambiguous, source and reference mechanism for
ATP problems.
It is comprehensive and up-to-date, and thus provides an overview of
the current application of ATP.
The TPTP problems are stored in a specifically designed, easy to understand
format.
Utilities for manipulating the problems, for converting the problems
to other known ATP formats, and for finding problems with certain
characteristics, are provided.
Since its first release in 1993, many researchers have used the TPTP as
an appropriate and convenient basis for ATP system evaluation.
This technical report explains the motivations and reasoning behind the
development of the TPTP, and thus implicitly explains the design decisions
made.
It also serves as a manual explaining the structure and use of the TPTP:
it provides a full description of the TPTP contents and organization, details
of the utility programs, and guidelines for obtaining and using the TPTP.
| A Quick Installation Guide for the TPTP is given in
Section
Getting and Using the TPTP.
Please be sure to read the guidelines for using TPTP
problems and presenting results, given in
Section Using the TPTP.
|
What's New in TPTP
v2.7.0
:
- In the FOF part of the TPTP there are 222 new abstract problems,
resulting in 248 new problems, in the domains AGT, ALG.
There have been 5 bugfixes.
- In the CNF part of the TPTP there are 31 new abstract problems,
resulting in 51 new problems, in the domains
ALG COL COM GEO HWV PUZ SET SWV SYN.
There have been 65 bugfixes done.
- The problem ratings have been updated.
- One new domain has been added:
- The tptp2X utility has been extended and improved:
- New formats omdoc, mace4, math.
Previous Problem Collections
A large number of interesting problems have accumulated over the years in the
ATP community.
Besides publishing particularly interesting individual problems, from early on
researchers collected problems in order to obtain a basis for experimentation.
Problems in FOF, published by mathematicians and logicians prior to the
mechanization of reasoning (e.g.,
[Chu56]) provided the first source for
ATP researchers.
The first major publication in this regard was
[MOW76], which provides
an explicit listing of clauses for 63 CNF problems, many of which are still
relevant today
(to our knowledge, the first circulation of problems for testing ATP
systems was due to Larry Wos in the late sixties).
In the same year
[WM76] documented 86 CNF problems which have commonly
been used for ATP testing.
However, the problem clauses are not supplied in
[WM76].
A second major thrust was provided by
[Pel86], which lists 75
problems in both FOF and CNF.
Other more recent papers are
[BL+86],
[Qua92],
[MW92], and
[McC93], to name a few.
The Journal of Automated Reasoning's Problem Corner also provided interesting
challenge problems.
Problems published in hardcopy form are, however, often not suitable for
testing ATP systems, because they have to be transcribed to electronic form.
This is a cumbersome, error-prone process, and is feasible for only small
numbers of small problems.
The sparsness of research into ATP systems for FOF problems meant that
no electronic collections of FOF test problems have previously been commonly
available.
A CNF problem collection in electronic form was made publicly available by
Argonne National Laboratories (in Otter format
[McC94])
in 1988
[ANL].
This collection was a major source of problems for ATP researchers.
Other electronic collections of CNF problems are available, but have not been
announced officially (e.g., that distributed with the SPRFN ATP system
[SPRFN]).
Although some of these collections provide significant support to researchers,
and formed the early core of the TPTP library, none (with the possible
exception of the ANL collection) was specifically designed to serve as a common
basis for ATP research.
Rather, these collections typically were built in the course of research into
a particular ATP system.
As a result there are several factors that limit their usefulness as a common
basis for research.
In particular, previously existing problem collections:
- are often hard to discover and obtain.
System development and system evaluations typically rely on a
small set of test problems, depending on the collections of problems
available to the researcher.
- need to be transformed to the syntax of the ATP system being considered.
The problem format used in a collection may not be appropriate for the
desired purpose, and a comparatively large effort is required just to make
the problems locally usable (which in practice often means that such a
collection of problems is simply ignored).
- are often limited in scope and size.
The problems used are often homogeneous, and thus cannot be used
for broad testing of the capabilities of the ATP system under
consideration.
If there are too few problems, statistically significant testing is
not possible.
- may be outdated.
The problems may insufficiently reflect the current state-of-the-art in
ATP research.
- are sometimes designed and tuned (regarding formula selection,
formula ordering, and formula structure) for a particular ATP system.
Using a collection designed and tuned for a particular ATP system may
lead to biases in results.
- provide no indication of the significance or difficulty of the problems.
The significance and difficulty of a problem, with respect to the current
state-of-the-art in ATP systems, is hard to assess by newcomers to the
field.
Existing test problems are often not adequate anymore (e.g., Schubert's
Steamroller
[Sti86]), while others may be solvable only with
specialized techniques (e.g., LIM+
[Ble90]),
and therefore are much too hard to start with.
- are inconsistent in their presentation of equally named problems.
Many copies and variants of the same ``original'' problem may exist in
different collections.
This means that unambiguous identification of problems, and therefore a
clear interpretation of performance figures for given problems, is
difficult.
- are usually undocumented.
It is hard to obtain information on problem semantics, the
original problem source, and the particular style of axiomatization.
This also contributes to the problem of unambiguous problem identification.
- are almost always unserviced.
They do not provide a mechanism for adding new problems or
correcting errors in existing problems, and cannot be used to
electronically distribute new and corrected problems to the ATP community.
This in turn perpetuates the use of old and erroneous problems.
- provide no guidelines for their use.
Quite often, inadequate system evaluations are performed.
As a consequence, results that provide little indication
of the system properties are reported.
The problem of meaningfully interpreting results can be even worse than
indicated.
A few problems may be selected and hand-tuned (formulae arranged in a special
way, irrelevant formulae omitted, lemmas added in, etc) specifically for the
ATP system being tested.
The presentation of a problem can significantly affect the nature of the
problem, and changing the formulae clearly makes a different problem
altogether.
Nevertheless the problem may be referenced under the same name as it was
presented elsewhere.
As a consequence the experimental results reveal little.
Some researchers avoid this ambiguity by listing the formulae explicitly,
but obviously this usually cannot be done for a large number of problems or
for large individual problems.
The only satisfactory solution to these issues is a common and stable library
of problems.
The TPTP is such a library.
What is Required?
The goal for building the TPTP has been to overcome previous drawbacks, and
to centralize the burden of problem collection and maintenance.
The TPTP tries to address all relevant issues.
In particular, the TPTP:
- is easy to discover and obtain.
Awareness of the TPTP is assured by extensive formal and informal
announcements.
The TPTP is available via the internet (WWW and FTP), and is thus easily
available to the research community.
- is easy to use.
Problems are presented in a specifically designed, easy to understand
format. Automatic conversion to other known formats is also provided,
thus eliminating the necessity for any other transcription.
- spans a diversity of subject matters.
This reduces biases in the development and testing of ATP systems,
which arise from the use of a limited scope of problems.
It also provides an overview of the domains in which ATP systems are
used.
- is large enough for statistically significant testing.
In contrast to common practice, an ATP system should be evaluated over
a large number of problems, rather than a small set of judiciously
selected examples. The large size of the TPTP makes this possible.
- is comprehensive.
The TPTP contains most problems known to the community.
There is no longer a need to look elsewhere.
- is up-to-date.
As new problems appear in the literature and elsewhere (see
Section Inside the TPTP),
they are added to the TPTP as soon as possible.
- is independent of any particular ATP system.
The problems are arranged so as to be modular and human-readable,
rather than arranged for a particular ATP system.
- contains problems varying in difficulty, with a difficulty rating for
each problem.
The difficulty of problems in the TPTP ranges from very simple problems
through to open problems.
This allows all interested researchers, from newcomers to experts, to
rely on the same problem library.
The difficulty ratings provided with each problem are important for
several reasons:
- It simplifies problem selection according to the user's intention.
- It allows the quality of an ATP system to be judged.
- Over the years, changes in the problem ratings will provide an
indicator of the advancement in ATP.
- provides statistics for each problem and for the library as a whole.
This provides information about the syntactic nature of the problems.
- has an unambiguous naming scheme.
This provides unambiguous problem reference, and makes the comparison of
results meaningful.
See Section
Problem and Axiomatization Naming for details.
- is well structured and documented.
This allows effective and efficient use of the library.
Useful background information, such as an overview of ATP application
domains, is provided.
- documents each problem.
This contributes to the unambiguous identification of each problem.
- provides a mechanism for adding new problems.
The TPTP contains standard axiomatizations that can be used in new
problems.
This simplifies the construction of new problems (see
Section Please contact us if).
A template is provided for submission of new problems.
The TPTP is thus a channel for making new problems available to the
community, in a simple and effective way.
- provides a mechanism for correcting errors in existing problems.
All errors, noticed by the developers or reported by users,
are corrected.
Patched TPTP releases are made regularly.
- provides guidelines for its use in evaluating ATP systems.
A standard library of problems together with evaluation guidelines
makes reported results meaningful and reproducible by others.
This will in turn simplify and improve system comparisons, and
allow ATP researchers to accurately gauge their progress.
The development of the TPTP problem library is an ongoing project, with the
aim to provide all of the desired properties.
Current Limitations of the TPTP.
The current release of the TPTP library is limited to problems expressed in 1st
order logic.
There are no problems for induction, or for non-classical theorem proving.
However, see Section
Present and Future Activity for upcoming and planned extensions.