Skip to content

Gemini 3 Flash

Gemini 3 Flash delivers Gemini 3's pro-grade reasoning at flash-level latency and cost, using 30% fewer tokens than previous Gemini 2.5 models while outperforming them across most benchmarks.

ReasoningTool UseFile InputVision (Image)Web Searchtiered-costImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-3-flash',
prompt: 'Why is the sky blue?'
})

About Gemini 3 Flash

Gemini 3 Flash is Google's speed-optimized model in the Gemini 3 generation, combining Gemini 3's reasoning depth with the efficiency profile of the Flash tier. It significantly outperforms Gemini 2.5 Pro across most benchmarks, meaning a speed-tier model now surpasses a previous-generation flagship. It achieves this while consuming 30% fewer tokens and running at 3x the speed of its predecessors.

Thinking is first-class in Gemini 3 Flash. The thinkingLevel and includeThoughts provider options let you surface intermediate reasoning steps. This helps when debugging multi-step pipelines, constructing chain-of-thought datasets, or validating that the model reasons through a problem correctly. Set thinkingLevel to high when the task demands deeper inference and your latency budget allows it.

Because Gemini 3 Flash sits at the intersection of quality and throughput, it fits a wide range of real-world traffic patterns, from low-latency chat interfaces to batch document processing pipelines. Accessing it through AI Gateway adds observability, automatic retries, and provider failover without requiring a Google Cloud account.