0.1 The Game of Go and Theorem Proving
Formally defined rules [similarity].
Both the games of Go and theorem proving have algorithms to evaluate the results. In the game of Go, one can judge the result of each game when it is over by counting the stones and spaces for each player, and no ambiguity is left in deciding the result of a game. In theorem proving, when one finds a proof, others can systematically check if the alleged proof is a valid proof or not.
Expressive power of the system [difference].
Even though both systems are based on a set of simple rules, the expressive power of these systems differ. Depending on the underlying logics, a theorem proving task can involve advanced concepts such as abstraction, universal quantification, existential quantification, and polymorphism, which Go scores cannot express natively. This is especially true for more expressive logics such as classical higherorder logics or variants of calculus of constructions, where stronger proof automation is needed.
Amount of available training data [difference].
Some theorem proving researchers boast that they have large proof corpora. For example, the Isabelle community has the Archive of Formal Proofs (AFPs) [1], consisting of more than 1.5 millions of lines of code and 100 thousands lemmas. Unfortunately, even though these proof corpora are large for the small community of theorem provers, they are small compared to the data deployed in other domains.
Preference towards small data [difference].
The size of the community is not the only reason of small data available in the theorem proving community: logicians and mathematicians have developed expressive logics to describe general ideas in a concise manner. Combined with the tradeoff between proof automation and expressive power of underlying logic, this is doubly unfortunate: the more expressive logic we use, the less proof automation we have, but the more expressive the logic is, the less training data we can expect, which makes it hard to improve the proof automation for expressive logics using machine learning techniques.
Selfplayability [similarity/difference].
One might suspect that large data are not necessary to develop a powerful proof automation tool using machine learning. After all, DeepMind has made AlphaGo Zero [3] stronger than any previous versions of AlphaGo via selfplay without using data from human games. Unfortunately, even though both Go and theorem proving are based on clearly defined rules, theorem proving is not a twoplayer game by default. In the rest of this paper, we propose an approach to introducing selfplayability to theorem proving.
0.2 The Design of Selfplayable Games of Theorem Proving
One straightforward design of selfplayable games of theorem proving is as follows: (1) prepare a set of proof obligations from existing proof corpora, (2) let two competing provers try to prove these proof obligations, (3) count how many obligations each prover discharges, (4) consider the prover that solves more obligations as the winner, and the other one as the loser. We can use this naive approach as a part of reinforcement learning or evolutionary computation to optimize our provers for proof obligations that have already been proved. However, this approach is probably not powerful enough to improve provers for conjectures that are significantly different from the theorems in the training data
For example, let us assume that we enhance our prover, say P
, via selfplay using 100 theorems and their proofs in the AFPs. Since we already know how to prove these theorems, we can improve P
, so that P
can prove all of the 100 theorems within a reasonable timeout. However, when we try to improve P
to discharge a new conjecture, say Goldbach’s conjecture, we will find ourselves at a loss of training data: Currently, there is no known proofs of Goldbach’s conjecture or auxiliary lemmas that are verified to be useful to prove Goldbach’s conjecture.
If we add Goldbach’s conjecture to the above dataset, the improvement via selfplay would saturate after producing a prover that can discharge the 100 theorems from the AFP but not Goldbach’s conjecture:
since the gap between the theorems from the AFPs and Goldbach’s conjecture is too large, minor mutations to P
’s variants cannot produce a useful observable difference in the result of the game. What we need here is a mechanism to produce conjectures that we can reasonably expect to be useful to prove our target conjecture (Goldbach’s conjecture in this example) but not too difficult for our current prover P
.
Therefore, we propose to treat conjecturing and proof search as one problem
. Of course, we cannot be 100% sure which conjecture is useful to train our prover for Goldbach’s conjecture, since nobody has proved it yet. But if we consider a heuristic proof search as the exploration of an andor tree, we can estimate how important each node in the tree is from the search heuristics of the prover. Furthermore, given a long timeout, we can expect that the prover can discharge some of emerging subgoals, even if the prover cannot discharge the rootnode, which corresponds to the target conjecture (Goldbach’s conjecture, in this example).
Our idea is to use these proved subgoals to judge the competence of other versions of prover P
.
First, we produce two versions of our prover P
by mutation. Let us call them Pa
and Pb
, respectively. Using the approach explained above, we let Pa
produce a dataset Da
and let Pb
produce Db
. Now, we let Pb
try to prove the theorems in Da
, and let Pa
try to prove the theorems in Db
.
When Pa
and Pb
run out of time, we sum up the estimated values of theorems proved by each prover (Pa
, for example). Note that it was the opponent prover (Pb
in this example) that has decided the value of each theorem in each dataset (Db
in this case) when finding proofs of these subgoals for the first time. The prover that has gained more value is the winner of the game, and the other is the loser. Then, we keep running this game by mutating the winner until we produce a prover that can discharge the target conjecture. Since this process generates new conjectures tagged with their estimated values from the target conjecture in each iteration, we expect this approach continues producing conjectures useful to prove the target conjecture.
We are still in the early stage of the design. We might generalize this idea to nplayer games to avoid overfitting. For the moment, we prefer the design of the game to be irrelevant to any of underlying logics, ML algorithms for search heuristics, and mutation algorithms.
Acknowledgement
This work was supported by the European Regional Development Fund under the project AI&Reasoning (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000466).
References
 [1] Gerwin Klein, Tobias Nipkow, Larry Paulson, and Rene Thiemann. 2004.

[2]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George
van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas
Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal
Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray
Kavukcuoglu, Thore Graepel, and Demis Hassabis.
Mastering the game of go with deep neural networks and tree search.
Nature, 529(7587):484–489, 2016.  [3] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550:354 EP –, Oct 2017. Article.
Comments
There are no comments yet.