Skip to content

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's first fully hybrid reasoning model, letting developers toggle thinking on or off and set thinking budgets to tune the balance between quality, cost, and latency, all on top of the fast, multimodal foundation of 2.0 Flash.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-2.5-flash',
prompt: 'Why is the sky blue?'
})

About Gemini 2.5 Flash

Gemini 2.5 Flash builds directly on the 2.0 Flash foundation, carrying forward its speed and cost characteristics while adding a major reasoning upgrade. It launched in preview as Google's first fully hybrid reasoning model, a classification that sets it apart from both the 2.0 Flash generation and pure thinking models.

The hybrid design means thinking is not always on. You can disable thinking entirely to maintain 2.0 Flash response speed, or enable it and set thinking budgets to control how much deliberation the model applies before answering. With thinking on, 2.5 Flash shows meaningful performance improvements over the 2.0 generation on reasoning-intensive tasks. Its performance-to-cost ratio places it on the Pareto frontier, competitive on quality without requiring the full resource commitment of 2.5 Pro. This makes it well-suited for applications where some prompts are routine and some are complex, and you want a single model that adapts accordingly.

Gemini 2.5 Flash also integrates with tools including Google Search and code execution, and accepts multimodal input across text, images, video, and audio. The context window is 1M tokens, maintaining the long-context capability of the 2.0 Flash generation.