Skip to content

DeepSeek V4 Flash

DeepSeek V4 Flash is DeepSeek's April 23, 2026 efficiency-tier model in the V4 series. It pairs a hybrid attention architecture with a context window of 1.0M tokens and supports reasoning, tool use, and implicit caching.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'deepseek/deepseek-v4-flash',
prompt: 'Why is the sky blue?'
})
Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.