OpenAI’s gold medal performance on the International Math Olympiad
OpenAI claims a significant result: gold-level performance International Mathematical Olympiad. But they're scant on details and it needs to be independently verified.
This is a genuinely impressive-sounding result from OpenAI, as reshared by Simon Willison:
“I’m excited to share that our latest OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition — the International Math Olympiad (IMO).”
It uses an unreleased model — nothing on their current site, and not the upcoming GPT-5. It’s not as consumer-friendly, given its propensity to take hours to solve problems, but on the face of it, this is an interesting outcome.
As OpenAI research scientist Alexander Wei says:
“Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.”
So, how can we independently evaluate and consider this outcome?
It’s worth noting that it was graded by “three former Olympiads”, not the formal scorers, with no indication of how they were paid or what the arrangement was. It’s also obviously a black box, like almost all AI models; it’s not clear how this was achieved. In particular, while they claim no tools, it’s not clear what the training data was, or which techniques were used to build the model.
That’s not to dismiss these results outright! The IMO challenges are unique each year, and this has the potential to be a genuine breakthrough in computing. The next step would hopefully be a research paper that lays these things out. If this really is what they claim it is, it’s undeniably impressive. But it’s not enough to say it; it needs to be independently verified and repeated.
[Link]