MIT affiliates win AI for Math grants to speed up mathematical discovery

-

MIT Department of Mathematics researchers David Roe ’06 and Andrew Sutherland ’90, PhD ’07 are among the many inaugural recipients of the Renaissance Philanthropy and XTX Markets’ AI for Math grants

4 additional MIT alumni — Anshula Gandhi ’19, Viktor Kunčak SM ’01, PhD ’07; Gireeja Ranade ’07; and Damiano Testa PhD ’05 — were also honored for separate projects.

The primary 29 winning projects will support mathematicians and researchers at universities and organizations working to develop artificial intelligence systems that help advance mathematical discovery and research across several key tasks.

Roe and Sutherland, together with Chris Birkbeck of the University of East Anglia, will use their grant to spice up automated theorem proving by constructing connections between the L-Functions and Modular Forms Database (LMFDB) and the Lean4 mathematics library (mathlib).

“Automated theorem provers are quite technically involved, but their development is under-resourced,” says Sutherland. With AI technologies akin to large language models (LLMs), the barrier to entry for these formal tools is dropping rapidly, making formal verification frameworks accessible to working mathematicians. 

Mathlib is a big, community-driven mathematical library for the Lean theorem prover, a proper system that verifies the correctness of each step in a proof. Mathlib currently incorporates on the order of 105 mathematical results (akin to lemmas, propositions, and theorems). The LMFDB, an enormous, collaborative online resource that serves as a form of “encyclopedia” of recent number theory, incorporates greater than 109 concrete statements. Sutherland and Roe are managing editors of the LMFDB.

Roe and Sutherland’s grant will probably be used for a project that goals to enhance each systems, making the LMFDB’s results available inside mathlib as assertions which have not yet been formally proved, and providing precise formal definitions of the numerical data stored throughout the LMFDB. This bridge will profit each human mathematicians and AI agents, and supply a framework for connecting other mathematical databases to formal theorem-proving systems.

The fundamental obstacles to automating mathematical discovery and proof are the limited amount of formalized math knowledge, the high cost of formalizing complex results, and the gap between what’s computationally accessible and what is possible to formalize.

To deal with these obstacles, the researchers will use the funding to construct tools for accessing the LMFDB from mathlib, making a big database of unformalized mathematical knowledge accessible to a proper proof system. This approach enables proof assistants to discover specific targets for formalization without the necessity to formalize the complete LMFDB corpus prematurely.

“Making a big database of unformalized number-theoretic facts available inside mathlib will provide a robust technique for mathematical discovery, since the set of facts an agent might wish to contemplate while looking for a theorem or proof is exponentially larger than the set of facts that eventually must be formalized in actually proving the theory,” says Roe.

The researchers note that proving latest theorems on the frontier of mathematical knowledge often involves steps that depend on a nontrivial computation. For instance, Andrew Wiles’ proof of Fermat’s Last Theorem uses what’s generally known as the “3-5 trick” at a vital point within the proof.

“This trick is dependent upon the incontrovertible fact that the modular curve X_0(15) has only finitely many rational points, and none of those rational points correspond to a semi-stable elliptic curve,” in response to Sutherland. “This fact was known well before Wiles’ work, and is simple to confirm using computational tools available in modern computer algebra systems, nevertheless it will not be something one can realistically prove using pencil and paper, neither is it necessarily easy to formalize.”

While formal theorem provers are being connected to computer algebra systems for more efficient verification, tapping into computational outputs in existing mathematical databases offers several other advantages.

Using stored results leverages the hundreds of CPU-years of computation time already spent in creating the LMFDB, saving money that may be needed to redo these computations. Having precomputed information available also makes it feasible to look for examples or counterexamples without knowing ahead of time how broad the search could be. As well as, mathematical databases are curated repositories, not simply a random collection of facts. 

“The incontrovertible fact that number theorists emphasized the role of the conductor in databases of elliptic curves has already proved to be crucial to at least one notable mathematical discovery made using machine learning tools: murmurations,” says Sutherland.

“Our next steps are to construct a team, engage with each the LMFDB and mathlib communities, begin to formalize the definitions that underpin the elliptic curve, number field, and modular form sections of the LMFDB, and make it possible to run LMFDB searches from inside mathlib,” says Roe. “When you are an MIT student fascinated about getting involved, be happy to succeed in out!” 

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x