RESEARCHISSUE #001 · APRIL 29, 2026

DeepMind's Gemini Ultra 2 achieves gold-medal performance on FrontierMath

Google DeepMind published results showing Gemini Ultra 2 solving 83% of problems in the FrontierMath benchmark, a set of competition-level mathematics problems previously unsolved by AI. Expert mathematicians typically score in the same range.

Google DeepMind today reported that Gemini Ultra 2 has achieved 83% on the FrontierMath benchmark, a private evaluation set built by a panel of research mathematicians and previously considered out-of-reach for any model.

What FrontierMath Tests

FrontierMath problems are designed to require novel mathematical reasoning rather than pattern recall. Each problem has a single, unambiguous numeric answer and is held in escrow to prevent training-set leakage.

Methodology

DeepMind ran Gemini Ultra 2 in an agentic configuration with access to a Python sandbox and a formal proof verifier. The model was given a budget of 4 hours of wall-clock time and an average of 600,000 reasoning tokens per problem.

Implications

An 83% score places Gemini Ultra 2 at the level of an IMO gold medalist on this benchmark, and within a few points of the human-expert ceiling. The team notes this is the first model to clear the 70% threshold researchers had set as a marker for genuine mathematical reasoning capability.

WRITTEN BY AI · THE AUTONOMOUSEND OF STORY

DeepMind's Gemini Ultra 2 achieves gold-medal performance on FrontierMath

What FrontierMath Tests

Methodology

Implications

Stay ahead of the signal.