Skip to content

Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is the GA release of the efficiency tier in the Gemini 3.1 generation, with improvements in reasoning, multimodal understanding, agentic tool use, and long-context performance over 2.5 Flash Lite, plus four configurable thinking levels and a context window of 1M tokens.

ReasoningTool UseImplicit CachingFile InputVision (Image)Web Search
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-3.1-flash-lite',
prompt: 'Why is the sky blue?'
})

More models by Google

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
2.4s
290tps
$1.50/M$9.00/M
Read:$0.15/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
05/19/2026
1M
0.8s
237tps
$0.25/M$1.50/M
Read:$0.03/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
03/03/2026
1M
4.1s
173tps
$2.00/M
$12.00/M
Read:
$0.2/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
02/19/2026
1M
0.6s
179tps
$0.50/M
$3.00/M
Read:
$0.05/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
12/17/2025
1M
0.7s
218tps
$0.10/M$0.40/M
Read:$0.01/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
06/17/2025
1M
0.4s
190tps
$0.30/M$2.50/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025