Gemini

Our most intelligent AI models

Models

Gemini 2.5: Our most intelligent models are getting even better

Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.

What's new

Access the latest preview of Gemini 2.5 Pro

We’re introducing an upgraded preview of Gemini 2.5 Pro, our most intelligent model yet.

Try in Google AI Studio

Deep Think

We’re making Gemini 2.5 Pro even better by introducing an enhanced reasoning mode called Deep Think.

Learn more

Native audio

Converse in more expressive ways with native audio outputs that capture the subtle nuances of how we speak.

Learn more

An even better 2.5 Flash

Improved across key benchmarks for reasoning, multimodality, code and long context while getting even more efficient.

Try in Google AI Studio

Model family

Gemini 2.5 builds on the best of Gemini — with native multimodality and a long context window.

Hands-on with 2.5 Pro

See how Gemini 2.5 Pro uses its reasoning capabilities to create interactive simulations and do advanced coding.

Make an interactive animation

See how Gemini 2.5 Pro uses its reasoning capabilities to create an interactive animation of “cosmic fish” with a simple prompt.

Create your own dinosaur game

Watch Gemini 2.5 Pro create an endless runner game, using executable code from a single line prompt.

Code a fractal visualization

See how Gemini 2.5 Pro creates a simulation of intricate fractal patterns to explore a Mandelbrot set.

Plot interactive economic data

Watch Gemini 2.5 Pro use its reasoning capabilities to create an interactive bubble chart to visualize economic and health indicators over time.

Animate complex behavior

See how Gemini 2.5 Pro creates an interactive Javascript animation of colorful boids inside a spinning hexagon.

Code particle simulations

Watch Gemini 2.5 Pro use its reasoning capabilities to create an interactive simulation of a reflection nebula.

Performance

Gemini 2.5 is state-of-the-art across a wide range of benchmarks.

Benchmarks

Gemini 2.5 Pro demonstrates significantly improved performance across a wide range of benchmarks.

Benchmark		Gemini 2.5 Pro Preview 06-05 Thinking	OpenAI o3 High	OpenAI o4-mini High	Claude Opus 4 32k thinking	Grok 3 Beta Extended thinking	DeepSeek R1 05-28
Input price	$/1M tokens (no caching)	$1.25 $2.50 > 200k tokens	$10.00	$1.10	$15.00	$3.00	$0.55
Output price	$/1M tokens	$10.00 $15.00 > 200k tokens	$40.00	$4.40	$75.00	$15.00	$2.19
Reasoning & knowledge Humanity's Last Exam (no tools)		21.6%	20.3%	14.3%	10.7%	—	14.0%*
Science GPQA diamond	single attempt	86.4%	83.3%	81.4%	79.6%	80.2%	81.0%
	multiple attempts	—	—	—	83.3%	84.6%	—
Mathematics AIME 2025	single attempt	88.0%	88.9%	92.7%	75.5%	77.3%	87.5%
	multiple attempts	—	—	—	90.0%	93.3%	—
Code generation LiveCodeBench (UI: 1/1/2025-5/1/2025)	single attempt	69.0%	72.0%	75.8%	51.1%	—	70.5%
Code editing Aider Polyglot		82.2% diff-fenced	79.6% diff	72.0% diff	72.0% diff	53.3% diff	71.6%
Agentic coding SWE-bench Verified	single attempt	59.6%	69.1%	68.1%	72.5%	—	—
	multiple attempts	67.2%	—	—	79.4%	—	57.6%
Factuality SimpleQA		54.0%	48.6%	19.3%	—	43.6%	27.8%
Factuality FACTS grounding		87.8%	69.6%	62.1%	77.7%	74.8%	—
Visual reasoning MMMU	single attempt	82.0%	82.9%	81.6%	76.5%	76.0%	no MM support
	multiple attempts	—	—	—	—	78.0%	no MM support
Image understanding Vibe-Eval (Reka)		67.2%	—	—	—	—	no MM support
Video understanding VideoMMMU		83.6%	—	—	—	—	no MM support
Long context MRCR v2 (8-needle)	128k (average)	58.0%	57.1%	36.3%	—	34.0%	—
	1M (pointwise)	16.4%	no support	no support	no support	no support	no support
Multilingual performance Global MMLU (Lite)		89.2%	—	—	—	—	—

Methodology

Gemini results: All Gemini scores are pass @1."Single attempt" settings allow no majority voting or parallel test-time compute; "multiple attempts" settings allow test-time selection of the candidate answer. They are all run with the AI Studio API for the model-id gemini-2.5-pro-preview-06-05 with default sampling settings. To reduce variance, we average over multiple trials for smaller benchmarks. Aider Polyglot score is the pass rate average of 3 trials. Vibe-Eval results are reported using Gemini as a judge.

Non-Gemini results: All the results for non-Gemini models are sourced from providers' self reported numbers unless mentioned otherwise below.

All SWE-bench Verified numbers follow official provider reports, using different scaffoldings and infrastructure. Google's scaffolding for "multiple attempts" for SWE-Bench includes drawing multiple trajectories and re-scoring them using model's own judgement.

Thinking vs not-thinking: For Claude 4 results are reported for the reasoning model where available (HLE, LCB, Aider). For Grok-3 all results come with extended reasoning except for SimpleQA (based on xAI reports) and Aider. For OpenAI models high level of reasoning is shown where results are available (except for GPQA, AIME 2025, SWE-Bench, FACTS, MMMU).

Single attempt vs multiple attempts: When two numbers are reported for the same eval higher number uses majority voting with n=64 for Grok models and internal scoring with parallel test time compute for Anthropic models.

Result sources: Where provider numbers are not available we report numbers from leaderboards reporting results on these benchmarks: Humanity's Last Exam results are sourced from https://agi.safe.ai/ and https://scale.com/leaderboard/humanitys_last_exam, AIME 2025 numbers are sourced from https://matharena.ai/. LiveCodeBench results are from https://livecodebench.github.io/leaderboard.html (1/1/2025 - 5/1/2025 in the UI), Aider Polyglot numbers come from https://aider.chat/docs/leaderboards/. FACTS come from https://www.kaggle.com/benchmarks/google/facts-grounding. For MRCR v2 which is not publicly available yet we include 128k results as a cumulative score to ensure they can be comparable with other models and a pointwise value for 1M context window to show the capability of the model at full length. The methodology has changed in this table vs previously published results for MRCR v2 as we have decided to focus on a harder, 8-needle version of the benchmark going forward.

API costs are sourced from providers' website and are current as of June 5th.

* indicates evaluated on text problems only (without images)

Building responsibly in the agentic era

As we develop these new technologies, we recognize the responsibility it entails, and aim to prioritize safety and security in all our efforts.

Learn more

For developers

Gemini’s advanced thinking, native multimodality and massive context window empowers developers to build next-generation experiences.

Start building

Developer ecosystem

Build with cutting-edge generative AI models and tools to make AI helpful for everyone.

Google AI Studio

Build with the latest models from Google DeepMind

Gemini API

Easily integrate Google’s most capable AI model to your apps

Accessing our latest AI models

We want developers to gain access to our models as quickly as possible. We’re making these available through Google AI Studio.

Gemini

Gemma

Generative models

Experiments

Projects

Publications

News

AI for biology

AI for climate and sustainability

AI for mathematics and computer science

AI for physics and chemistry

AI transparency

News

Careers

Milestones

Education

Responsibility

The Podcast

Gemini