Simon Willison on mathematics

18 posts tagged “mathematics”

2025

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad (via) OpenAI beat them to the punch in terms of publicity by publishing their results on Saturday, but a team from Google Gemini achieved an equally impressive result on this year's International Mathematics Olympiad scoring a gold medal performance with their custom research model.

(I saw an unconfirmed rumor that the Gemini team had to wait until Monday for approval from Google PR - this turns out to be inaccurate, see update below.)

It's interesting that Gemini achieved the exact same score as OpenAI, 35/42, and were able to solve the same set of questions - 1 through 5, failing only to answer 6, which is designed to be the hardest question.

Each question is worth seven points, so 35/42 cents corresponds to full marks on five out of the six problems.

Only 6 of the 630 human contestants this year scored all 7 points for question 6 this year, and just 55 more had greater than 0 points for that question.

OpenAI claimed their model had not been optimized for IMO questions. Gemini's model was different - emphasis mine:

We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.

To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions.

The Gemini team, like the OpenAI team, achieved this result with no tool use or internet access for the model.

Gemini's solutions are listed in this PDF. If you are mathematically inclined you can compare them with OpenAI's solutions on GitHub.

Last year Google DeepMind achieved a silver medal in IMO, solving four of the six problems using custom models called AlphaProof and AlphaGeometry 2:

First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

This year's result, scoring gold with a single model, within the allotted time and with no manual step to translate the problems first, is much more impressive.

Update: Concerning the timing of the news, DeepMind CEO Demis Hassabis says:

Btw as an aside, we didn’t announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved

We've now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!

OpenAI's Noam Brown:

Before we shared our results, we spoke with an IMO board member, who asked us to wait until after the award ceremony to make it public, a request we happily honored.

We announced at ~1am PT (6pm AEST), after the award ceremony concluded. At no point did anyone request that we announce later than that.

As far as I can tell the Gemini team was participating in an official capacity, while OpenAI were not. Noam again:

~2 months ago, the IMO emailed us about participating in a formal (Lean) version of the IMO. We’ve been focused on general reasoning in natural language without the constraints of Lean, so we declined. We were never approached about a natural language math option.

Neither OpenAI nor Gemini used Lean in their attempts, which would have counted as tool use.

# 21st July 2025, 7:14 pm / mathematics, ai, openai, generative-ai, llms, gemini, llm-reasoning

An AI tool that gets gold on the IMO is obviously immensely impressive. Does it mean math is “solved”? Is an AI-generated proof of the Riemann hypothesis clearly on the horizon? Obviously not.

Worth keeping timescales in mind here: IMO competitors spend an average of 1.5 hrs on each problem. High-quality math research, by contrast, takes month or years.

What are the obstructions to AI performing high-quality autonomous math research? I don’t claim to know for sure, but I think they include many of the same obstructions that prevent it from doing many jobs: Long context, long-term planning, consistency, unclear rewards, lack of training data, etc.

It’s possible that some or all of these will be solved soon (or have been solved) but I think it’s worth being cautious about over-indexing on recent (amazing) progress.

— Daniel Litt, Assistant Professor of mathematics, University of Toronto

# 21st July 2025, 3:13 pm / mathematics, ai, generative-ai, llms, daniel-litt

OpenAI’s gold medal performance on the International Math Olympiad. This feels notable to me. OpenAI research scientist Alexander Wei:

I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs. [...]

Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.

In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold!

HUGE congratulations to the team—Sheryl Hsu, Noam Brown, and the many giants whose shoulders we stood on—for turning this crazy dream into reality! I am lucky I get to spend late nights and early mornings working alongside the very best.

Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.

(Normally I would just link to the tweet, but in this case Alexander built a thread... and Twitter threads no longer work for linking as they're only visible to users with an active Twitter account.)

Here's Wikipedia on the International Mathematical Olympiad:

It is widely regarded as the most prestigious mathematical competition in the world. The first IMO was held in Romania in 1959. It has since been held annually, except in 1980. More than 100 countries participate. Each country sends a team of up to six students, plus one team leader, one deputy leader, and observers.

This year's event is in Sunshine Coast, Australia. Here's the web page for the event, which includes a button you can click to access a PDF of the six questions - maybe they don't link to that document directly to discourage it from being indexed.

The first of the six questions looks like this:

Alexander shared the proofs produced by the model on GitHub. They're in a slightly strange format - not quite MathML embedded in Markdown - which Alexander excuses since "it is very much an experimental model".

The most notable thing about this is that the unnamed model achieved this score without using any tools. OpenAI's Sebastien Bubeck emphasizes that here:

Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.

There's a bunch more useful context in this thread by Noam Brown, including a note that this model wasn't trained specifically for IMO problems:

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

So what’s different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME, where answers are simply an integer from 0 to 999.

Also this model thinks for a long time. o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly, it’s also more efficient with its thinking. And there’s a lot of room to push the test-time compute and efficiency further.

It’s worth reflecting on just how fast AI progress has been, especially in math. In 2024, AI labs were using grade school math (GSM8K) as an eval in their model releases. Since then, we’ve saturated the (high school) MATH benchmark, then AIME, and now are at IMO gold. [...]

When you work at a frontier lab, you usually know where frontier capabilities are months before anyone else. But this result is brand new, using recently developed techniques. It was a surprise even to many researchers at OpenAI. Today, everyone gets to see where the frontier is.

# 19th July 2025, 4:27 pm / mathematics, ai, openai, generative-ai, llms, llm-reasoning

Basically any resource on a difficult subject—a colleague, Google, a published paper—will be wrong or incomplete in various ways. Usefulness isn’t only a matter of correctness.

For example, suppose a colleague has a question she thinks I might know the answer to. Good news: I have some intuition and say something. Then we realize it doesn’t quite make sense, and go back and forth until we converge on something correct.

Such a conversation is full of BS but crucially we can interrogate it and get something useful out of it in the end. Moreover this kind of back and forth allows us to get to the key point in a way that might be difficult when reading a difficult ~50-page paper.

To be clear o3-mini-high is orders of magnitude less useful for this sort of thing than talking to an expert colleague. But still useful along similar dimensions (and with a much broader knowledge base).

— Daniel Litt

# 1st February 2025, 9:46 pm / mathematics, ai, generative-ai, llms, o3, daniel-litt

Largest known prime number (via) Discovered on 12th October 2024 by the Great Internet Mersenne Prime Search. The new largest prime number is 2^136279841-1 - 41,024,320 digits long.

# 2nd January 2025, 7:39 am / mathematics

2024

[… OpenAI’s o1] could work its way to a correct (and well-written) solution if provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student.

— Terrence Tao

# 15th September 2024, 12:04 am / mathematics, ai, openai, generative-ai, llms, o1

An animated introduction to Fourier Series (via) Outstanding essay and collection of animated explanations (created using p5.js) by Andrei Ciobanu explaining Fourier transforms, starting with circles, pi, radians and building up from there.

I found Fourier stuff only really clicked for me when it was accompanied by clear animated visuals, and these are a beautiful example of those done really well.

# 5th June 2024, 3:43 pm / mathematics, processing, explorables

2023

Google DeepMind used a large language model to solve an unsolvable math problem. I’d been wondering how long it would be before we saw this happen: a genuine new scientific discovery found with the aid of a Large Language Model.

DeepMind found a solution to the previously open “cap set” problem using Codey, a fine-tuned variant of PaLM 2 specializing in code. They used it to generate Python code and found a solution after “a couple of million suggestions and a few dozen repetitions of the overall process”.

# 16th December 2023, 1:37 am / google, mathematics, ai, generative-ai, llms

2019

An Interactive Introduction to Fourier Transforms (via) I love interactive exploitable explanations and this is the best I’ve seen in a while: Jez Swanson breaks down exactly what a Fourier transform does, first by letting you interactively draw and deconstruct wave patterns and then by showing Epicycles andcexplsining JPEG compression. All with not a formula in sight!

# 12th January 2019, 2:55 am / mathematics, explorables

2011

Would having a maths degree put you at that much of a disadvantage against a CS student when it comes to jobs?

No. Plenty of the great programmers I know have maths, physics or even literature degrees. Read a couple of classic computer science text books and get some programming projects under your belt and you’ll be fine.

[... 64 words]

2:27 pm / 31st December 2011 / jobs, mathematics, programming, quora, careers

2010

Google Image Charts: Mathematical (TeX) Formulas (via) I’m not sure when they added this, but you can now use the Google Charts Image API to render mathematical formulas, specified using TeX syntax. Wordpress.com and Wikipedia have both offered this feature for quite a while, but now you can use it anywhere on the Web.

# 12th February 2010, 9:42 am / formula, google, google-charts, mathematics, tex

GPS and Relativity (via) GPS satellite clock ticks need an accuracy of 20-30 nanoseconds. The satellites move fast enough that their clocks fall behind by 7 microseconds a day due to time dilation, but orbit high enough that the curvature of spacetime due to the Earth’s mass puts them forward by another 45 microseconds. GPS receivers have to perform relativistic calculations to determine their location!

# 11th January 2010, 9:17 am / gps, mathematics, physics, relativity, spacescience

2009

Mobius Sliced Linked Bagel. “It is much more fun to put cream cheese on these bagels than on an ordinary bagel. In additional to the intellectual stimulation, you get more cream cheese, because there is slightly more surface area.”

# 9th December 2009, 8:03 am / bagels, breakfast, food, funny, mathematics, mobius

2004

Python in Mathematics

Python in the Mathematics Curriculum by Kirby Urner is something of a sprawling masterpiece. It really comes in four parts: the first is a history of computer science in education, the second an appraisal of the impact of open source on education and the world at last, the third a dive in to the things that make Python so suitable for enhancing the mathematics curriculum and the fourth a discussion of how computer science and traditional mathematics are likely to play off against each other in the field of high school education.

[... 319 words]

2:44 am / 22nd April 2004 / education, mathematics, python

2003

Pythonic Geometry. Teaching maths and Python to 13 year olds

# 15th December 2003, 2:28 am / mathematics

Python for teaching mathematics

Kirby Urner provides some great examples of how Python can be used as an aid to understanding mathematics on the marketing-python mailing list. I particularly liked this demonstration of Pascal’s triangle using Python generators:

[... 139 words]

12 pm / 13th September 2003 / mathematics, python, teaching

2002

Maths for apps problems class

I didn’t quite understand this part of the lecture as we arrived late. These are the notes copied from the board.

[... 1,060 words]

2:12 pm / 30th September 2002 / mathematics

Maths for Apps lecture 1

These notes are for Dr Daniel Richardson’s course “Mathematics for Applications I” at the University of Bath.The required text book is “Linear Algebra with Applications” by G. Williams, published by Jones and Bartlett

[... 803 words]

6:53 pm / 23rd September 2002 / mathematics

Simon Willison’s Weblog