Systems Revealed as Unsound by TPTP Testing


System When How
MELIA 0.1.2 October 2011 KRS040+1 found to be Unsatisfiable, MGT037+2 found to be Theorem, PUZ133+3 found to be Theorem, RNG127+1 found to be Unsatisfiable.
iProver-Eq 0.7 October 2011 SET960+1 SET961+1 SET977+1 found to be CounterSatisfiable
Nitrox 0.1 September 2010 Unsound on CNFs (due to missing quantifiers), e.g., NUM005-1
IsabelleP 2009-2 September 2010 SMT tactic claimed SEV378^5 and SEV381^5 are theorems
E-Darwin 1.2 May 2010 Claimed SYN072-1 and SYN914-1 are satisfiable
IsabelleN 2009 (Nitpick 1.2) November 2009 Claimed SET640^3 SET646^3 SET651^3 SET657^3 to be CSA
LEO-II 1.0 October 2009 Found SYO137^5 to be a THM
OSHL 0.6 October 2009 Many UNS found to be SAT
Darwin 1.2 October 2009 10 THM found to be CSA (e.g.,, CSR050+1) and 8 UNS found to be SAT (e.g., MGT006-1)
SPASS 3.01 June 2009 SWV555-1.010 found to be SAT (it's UNS)
Theo 2006 June 2009 Found SWV482+1 to be a THM
TPS 3.0 in GOOD1 mode June 2009 Various modes found to be unsound
LEO-II 0.99 December 2008 Found SYN989^2 to be UNS
E-Darwin 1.0 December 2008 Found HWV038-1, MGT034-1, SET781-2 to be UNS (they are SAT)
Metis 2.1 December 2008 Found some UNS to be SAT, some THM to be CSA
OSHL-S 0.1 December 2008 Found some UNS to be SAT
E=KRHyper 1.1 November 2008 Found some CSA to be THM
E 0.999 February 2008 Finds SYN551+1 CSA
DCTP 10.21p February 2008 Finds PUZ066+1 UNS, SEU240+1 CSA, SYN071+1 CSA
E-KRHyper 1.0 June 2007 Finds SYN426-1 UNS
Metis 2.0 June 2007 Finds KRS174+1 CSA
E-SETHEO csp04 July 2006 Finds PUZ066+1 UNS and SYN071+1 CSA
Darwin J3 June 2006 Finds GEO153-1 (Tarski's geometry axioms) unsatisfiable
Gandalf c-2.6 --sat June 2005 Finds SYN487-1 SYN508-1 SYN914-1 to be satisfiable
Vampire 7.0 June 2005 Finds proof for SYN322+1. Andrei explains "The plain mode version uses --miniscope on, I knew it was buggy and turned it off for the CASC version."
SNARK 20050129b May 2005 Technicality - interprets cons/2 and hence finds some proofs that don't exist for problem that use cons as a predicate, e.g., COL094-2
Mace2 2.2 June 2004 Found SYN071-1 to be satisfiable, while testing for CASC-J2
Mace4 2003-B May 2004 Bug in clausifier exposed by finding ALG140+1 to be satisfiable
E-SETHEO csp03 May 2004 Found SWC128-1 to be satisfiable
CiME 2 July 2003 Found contradiction in output, both ways, e.g.,
BOO036-1 has satisfiable status; CiME 2 says it has found a theorem
COL053-1 has unsatisfiable status; CiME 2 says it has found a satisfiable
TGTP C-15 June 2003 Claessen pointed out TGTP proofs for satisfiable sets
MUSCADET 2.3 June 2002 Found a proof for non-theorems, e.g., NLP043+1.p
MUSCADET 2.0 June 2001 Failed soundess testing for CASC-JC, due to an internal use of a symbol that occured as a predicate. Fixed for CASC-JC.
Gandalf c2.1 March 2001 Reported that some satisfiable sets were unsatisfiable. This is the version of Gandalf that competed in CASC-17. Found in testing by Geoff.
Vampire 1.17 March 2001 Reported that some unsatisfiable sets were satisfiable. Found in testing of this non-public Vampire release, by Geoff.
E 0.5 August 1999 See the E-SETHEO report above. Disqualified from being ranked in CASC-16, but the data from the competition is still valid.
E-SETHEO 99csp August 1999 Change in TPTP problem status (set as unsatisfiable by E-SETHEO results data submitted to the collection) conflicted with the results data. Problem traced to the E 0.5 component. Disqualified from being ranked in CASC-16, but the data from the competition is still valid.
HOTTER 0.0 CASC-16, July 1999 Failed soundness testing. One binary used in both the prover and the model generator. Failed first one way, repaired and failed the other way, repaired again and failed back the first way. Both versions disqualified by panel.
Vampire 0.0 May 1999 Data submitted to the results collection conflicted. Repaired for CASC-16.
Bliksem 1.00 August 1998 Busted after CASC-15, after an "impressive performance". Disqualified from CASC-15. Repaired in Bliksem v1.01.
PROTEIN V2.00 (I think) Early 1997 (I think) Data submitted to the results collection conflicted. Repaired for CASC-14.
I-THOP Early 1997 Data submitted to the results collection conflicted. Repaired for CASC-14.
Violet CASC-13, July 1996 Failed soundness testing, due to error in equality transformation. Repaired in time for the competition.
CLIN Ages ago Data submitted to the results collection conflicted. Can't recall how this was resolved.
Otter 3.0.? Ages ago Busted on a technicality, as Otter interpreted any predicate starting with eq as equality. TPTP has predicates other than equal/2 starting with eq. New Otter flag tptp_eq introduced to fix it.