Gemini 2.5 Flash Lite
Gemini 2.5 Flash Lite is the fastest and most affordable model in the Gemini 2.5 family, with configurable thinking, a context window of 1.0M tokens, and benchmark improvements over 2.0 Flash-Lite across coding, math, and science, at a price designed for high-throughput agentic pipelines.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-2.5-flash-lite', prompt: 'Why is the sky blue?'})About Gemini 2.5 Flash Lite
Gemini 2.5 Flash Lite is the efficiency tier of the Gemini 2.5 family, released June 17, 2025 alongside 2.5 Flash and 2.5 Pro going to general availability. It runs faster and costs less than any other 2.5 model while outperforming 2.0 Flash-Lite on benchmarks that matter for real-world developer tasks: coding, mathematics, scientific reasoning, and instruction following.
Configurable thinking is the feature that most distinguishes Gemini 2.5 Flash Lite from 2.0 Flash-Lite. At inference time, you set a thinking level (minimal, low, medium, or high) to allocate more deliberation to harder problems without switching endpoints. This is the same thinking mechanism available in 2.5 Flash and 2.5 Pro, scaled down to the lite budget. For tasks that occasionally need more reasoning depth than a pure speed-first model provides, the thinking toggle avoids the cost jump of routing to a full reasoning model.
For teams running 2.0 Flash-Lite in production and evaluating a 2.5 upgrade path, Gemini 2.5 Flash Lite is the migration-friendly choice: better benchmark performance, thinking capability, and latency that matches or beats the previous generation.