What is Gemini 2.5 Pro's score on LMArena?

Gemini 2.5 Pro ranks highly on LMArena, which measures human preferences across a broad range of tasks. Check the LMArena leaderboard for the latest score, as rankings shift over time.

How does 2.5 Pro's thinking capability differ from 2.5 Flash's?

Both models reason through problems before responding. Gemini 2.5 Pro is the Pro tier in the Gemini 2.5 family and posts strong results on coding, math, and science benchmarks. Gemini 2.5 Flash provides configurable thinking budgets and sits at the Pareto frontier of cost and performance.

What is Humanity's Last Exam and how does Gemini 2.5 Pro perform on it?

Humanity's Last Exam is a benchmark dataset created by hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. Gemini 2.5 Pro scores 18.8% on this benchmark without tool use.

What is the context window size?

Gemini 2.5 Pro has a context window of 1.0M tokens, the largest among Gemini 2.5 models, enabling it to process entire code repositories, lengthy research datasets, or extensive multi-document inputs in a single pass.

What tool use capabilities does 2.5 Pro have?

Google Search and code execution are available as built-in tools. The model can fetch real-time information, run code, and verify results within a single inference session.

Does 2.5 Pro support multimodal input?

Yes. The model accepts text, audio, images, video, and entire code repositories as input, maintaining the native multimodality that defines the Gemini model family.

Is Gemini 2.5 Pro generally available?

It launched as an experimental model on March 20, 2025. Google later promoted it to stable general availability as part of the Gemini 2.5 family expansion.

Gemini 2.5 Pro

Gemini 2.5 Pro is a Pro-tier thinking model from Google, built for complex reasoning, coding, math, and science tasks, with strong results on human preference benchmarks and a context window of 1.0M tokens.

File InputReasoningTool UseVision (Image)Web Searchtiered-costImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-2.5-pro',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Gemini 2.5 Pro by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

2.0s

106tps

$1.25/M

$10.00/M

Read:

$0.13/M

Write:

—

$35.00/K

+ input costs

—

03/20/2025

Legal:Terms

•

Privacy

2.8s

125tps

$1.25/M

$10.00/M

Read:

$0.13/M

Write:

—

$35.00/K

+ input costs

—

03/20/2025

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.4s

290tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

0.8s

237tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

4.1s

173tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.6s

179tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

0.7s

218tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

0.4s

190tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

About Gemini 2.5 Pro

Google introduced Gemini 2.5 Pro on March 20, 2025 as the flagship of the Gemini 2.5 thinking model generation. Reasoning is its headline capability. Gemini 2.5 models reason through their thoughts before responding, and Google achieved this performance level by combining a significantly enhanced base model with improved post-training. On reasoning benchmarks, 2.5 Pro posts strong results on math and science (including GPQA and AIME 2025) without majority voting or other cost-increasing test-time techniques. On Humanity's Last Exam, a dataset designed by hundreds of subject matter experts to represent the human frontier of knowledge and reasoning, 2.5 Pro scores 18.8% without tool use.

Coding performance received particular attention. Gemini 2.5 Pro represents a significant leap over the 2.0 generation in creating web apps and agentic code applications, along with code transformation and editing. On SWE-Bench Verified, the industry-standard benchmark for agentic code evaluation, it scores 63.8% with a custom agent setup. It can generate a playable video game from a single-line prompt.

Gemini 2.5 Pro ships with a context window of 1.0M tokens, the largest among Gemini 2.5 models, and supports text, audio, images, video, and entire code repositories as input. Tool use including Google Search and code execution is available.

What To Consider When Choosing a Provider

Configuration: Given the context window of 1.0M tokens, applications passing very large inputs should confirm provider-side limits and latency expectations for long-context requests before deploying at scale.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 2.5 Pro

Best For

Advanced coding and software engineering: Building visually compelling web applications, writing agentic code, performing large-scale code transformation and editing across entire repositories
Complex mathematical and scientific reasoning: Multi-step problems in mathematics, physics, chemistry, or logic that require sustained chain-of-thought reasoning without test-time augmentation
Research and long-document analysis: Processing entire codebases, academic papers, legal corpora, or research datasets within a context of 1.0M tokens to extract insights, connections, and answers
Hard benchmark-level tasks: Questions from expert-curated datasets, graduate-level reasoning problems, or tasks at the outer edge of what general-purpose models typically handle
Agentic applications requiring deep planning: Multi-step workflows where the model must reason across tools, plan sub-tasks, and produce executable or high-accuracy outputs

Consider Alternatives When

High-volume routine tasks: Translation, classification, and summarization where the reasoning depth of 2.5 Pro adds cost without improving output quality
Speed-first accuracy targets: Response speed is paramount and accuracy requirements can be met by 2.5 Flash with thinking enabled
Smaller context windows suffice: Your application does not benefit from the context window of 1.0M tokens, making the pricing premium for Pro's larger capacity unnecessary
Embedding or retrieval workloads: A dedicated embedding model is architecturally appropriate for these use cases

Conclusion

Gemini 2.5 Pro is purpose-built for the hardest problems: code that requires deep understanding of large repositories, mathematical reasoning at competition level, and research tasks that demand both breadth of knowledge and sustained logical precision. Teams tackling the most demanding inference workloads will find in 2.5 Pro a model whose thinking architecture and context window of 1.0M tokens were designed specifically for that class of challenge.

Frequently Asked Questions

What is Gemini 2.5 Pro's score on LMArena?
Gemini 2.5 Pro ranks highly on LMArena, which measures human preferences across a broad range of tasks. Check the LMArena leaderboard for the latest score, as rankings shift over time.
What coding benchmarks does 2.5 Pro perform strongly on?
Gemini 2.5 Pro scores 63.8% on SWE-Bench Verified with a custom agent setup. SWE-Bench Verified is the industry-standard benchmark for agentic code evaluation. The model also excels at creating web apps, agentic code applications, and code transformation.
How does 2.5 Pro's thinking capability differ from 2.5 Flash's?
Both models reason through problems before responding. Gemini 2.5 Pro is the Pro tier in the Gemini 2.5 family and posts strong results on coding, math, and science benchmarks. Gemini 2.5 Flash provides configurable thinking budgets and sits at the Pareto frontier of cost and performance.
What is Humanity's Last Exam and how does Gemini 2.5 Pro perform on it?
Humanity's Last Exam is a benchmark dataset created by hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. Gemini 2.5 Pro scores 18.8% on this benchmark without tool use.
What is the context window size?
Gemini 2.5 Pro has a context window of 1.0M tokens, the largest among Gemini 2.5 models, enabling it to process entire code repositories, lengthy research datasets, or extensive multi-document inputs in a single pass.
What tool use capabilities does 2.5 Pro have?
Google Search and code execution are available as built-in tools. The model can fetch real-time information, run code, and verify results within a single inference session.
Does 2.5 Pro support multimodal input?
Yes. The model accepts text, audio, images, video, and entire code repositories as input, maintaining the native multimodality that defines the Gemini model family.
Is Gemini 2.5 Pro generally available?
It launched as an experimental model on March 20, 2025. Google later promoted it to stable general availability as part of the Gemini 2.5 family expansion.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemini 2.5 Pro

Playground

Providers

More models by Google

About Gemini 2.5 Pro

What To Consider When Choosing a Provider

When to Use Gemini 2.5 Pro

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions