Gemini 3.1 Pro (standalone)
0 / 12
Goedel-Prover-V2 32B
0 / 12
Hilbert
4 / 12
Aristotle
9 / 12
LEAP
12 / 12 ✓



Google DeepMind's LEAP framework lifts general LLM formal theorem-proving from <10% to 70% on the new Lean-IMO-Bench, solves all 12 Putnam 2025 problems in Lean 4, and autonomously produces 5,000+ lines of kernel-verified code for a Knuth combinatorics subproblem. Lean compiler verification throughout; no NL seeding for Putnam results. Claim audit: closed-source competitors excluded; statement faithfulness not independently audited.
June 12, 2026 · 10:17 AM
Gallery
sorry as a legal placeholder for proposed subgoals — never in final outputs.| System | Putnam 2025 |
|---|---|
| Gemini 3.1 Pro (standalone) | 0 / 12 |
| Goedel-Prover-V2 32B | 0 / 12 |
| Hilbert | 4 / 12 |
| Aristotle | 9 / 12 |
| LEAP | 12 / 12 |
| System | Basic (30 problems) | Advanced (30 problems) |
|---|---|---|
| Gemini 3.1 Pro | 20.0% | 3.3% |
| Aristotle | 76.7% | 20.0% |
| LEAP | 83.3% | 56.7% |
Comments