Gemini 2.5 Flash Lite

Gemini 2.5 Flash Lite is the fastest and most affordable model in the Gemini 2.5 family, with configurable thinking, a context window of 1.0M tokens, and benchmark improvements over 2.0 Flash-Lite across coding, math, and science, at a price designed for high-throughput agentic pipelines.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-2.5-flash-lite',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

About Gemini 2.5 Flash Lite

Gemini 2.5 Flash Lite is the efficiency tier of the Gemini 2.5 family, released June 17, 2025 alongside 2.5 Flash and 2.5 Pro going to general availability. It runs faster and costs less than any other 2.5 model while outperforming 2.0 Flash-Lite on benchmarks that matter for real-world developer tasks: coding, mathematics, scientific reasoning, and instruction following.

Configurable thinking is the feature that most distinguishes Gemini 2.5 Flash Lite from 2.0 Flash-Lite. At inference time, you set a thinking level (minimal, low, medium, or high) to allocate more deliberation to harder problems without switching endpoints. This is the same thinking mechanism available in 2.5 Flash and 2.5 Pro, scaled down to the lite budget. For tasks that occasionally need more reasoning depth than a pure speed-first model provides, the thinking toggle avoids the cost jump of routing to a full reasoning model.

For teams running 2.0 Flash-Lite in production and evaluating a 2.5 upgrade path, Gemini 2.5 Flash Lite is the migration-friendly choice: better benchmark performance, thinking capability, and latency that matches or beats the previous generation.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemini 2.5 Flash Lite

About Gemini 2.5 Flash Lite