Skip to content

DeepSeek V3.1

DeepSeek V3.1 is DeepSeek's August 21, 2025 model update introducing hybrid inference with selectable thinking and non-thinking modes in one endpoint. It strengthens tool use and multi-step agent capabilities over DeepSeek-V3.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'deepseek/deepseek-v3.1',
prompt: 'Why is the sky blue?'
})

Playground

Try out DeepSeek V3.1 by DeepSeek. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
DeepInfra
Legal:Terms
Privacy
164K
0.7s
10tps
$0.21/M$0.79/M
Read:$0.13/M
Write:
08/21/2025
Novita AI
Legal:Terms
Privacy
164K
1.0s
35tps
$0.27/M$1.00/M
Read:$0.14/M
Write:
08/21/2025
Baseten
Legal:Terms
Privacy
164K
0.2s
162tps
$0.50/M$1.50/M
08/21/2025
Fireworks
Legal:Terms
Privacy
164K
$0.56/M$1.68/M
Read:$0.28/M
Write:
08/21/2025
Together AI
Legal:Terms
Privacy
128K
$0.60/M$1.70/M
08/21/2025
SambaNova
Legal:Terms
Privacy
128K
$3.00/M$4.50/M
08/21/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by DeepSeek

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.6s
87tps
$0.14/M$0.28/M
Read:$0.0/M
Write:
deepinfra logo
deepseek logo
novita logo
04/23/2026
1M
0.6s
60tps
$0.43/M$0.87/M
Read:$0.0/M
Write:
deepinfra logo
deepseek logo
fireworks logo
+1
04/23/2026
164K
0.3s
67tps
$0.28/M$0.42/M
Read:$0.03/M
Write:
bedrock logo
deepinfra logo
deepseek logo
+2
12/01/2025
164K
0.3s
83tps
$0.28/M$0.42/M
Read:$0.03/M
Write:
bedrock logo
deepinfra logo
deepseek logo
+2
12/01/2025
131K
2.3s
28tps
$0.27/M$1.00/M
Read:$0.14/M
Write:
novita logo
09/22/2025
164K
1.0s
120tps
$0.77/M$0.77/M
Read:$0.14/M
Write:
baseten logo
novita logo
12/26/2024

About DeepSeek V3.1

DeepSeek V3.1 was released August 21, 2025. Its central change consolidates thinking and non-thinking inference into one model. Access non-thinking mode via the deepseek-chat API identifier and thinking mode via deepseek-reasoner. Previously these required separate deployments. The dual-mode design lets you route requests to different inference behaviors without maintaining separate integrations, simplifying agent architectures where some steps need reasoning and others don't.

The thinking mode offers improved efficiency over prior reasoning models. Strict function calling is available in beta, alongside Anthropic API format compatibility, expanding the range of infrastructure that can route to DeepSeek V3.1 without modification.

DeepSeek V3.1 targets stronger multi-step reasoning for complex search tasks, better performance on SWE-Bench and Terminal-Bench, and a new tokenizer with a refreshed chat template. Current AI Gateway rates appear on this page.

What To Consider When Choosing a Provider

  • Configuration: Two usage modes share the same model. Test both thinking and non-thinking paths in your integration to confirm your application correctly interprets response structure under each mode.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use DeepSeek V3.1

Best For

  • Mixed agent pipelines: Combine reasoning-heavy steps (tool planning, code generation) with fast-response steps (parsing, classification) through a single endpoint
  • Software engineering automation: SWE-Bench and Terminal-Bench improvements translate to better code generation and execution performance
  • Anthropic API compatibility: Existing Anthropic-format integrations route to DeepSeek V3.1 with minimal integration change
  • Complex multi-step search: The thinking mode's improved efficiency reduces total response latency for multi-step workflows
  • Upgrading from DeepSeek-V3: Backward-compatible API routing plus optional thinking mode

Consider Alternatives When

  • Pure reasoning workloads: DeepSeek-R1 remains the dedicated reasoning specialist
  • Multilingual stability critical: DeepSeek-V3.1 Terminus addresses reliability issues for Chinese-English code-switching output consistency
  • Straightforward chat or completion: DeepSeek-V3 may be more cost-efficient for high-volume workloads without hybrid inference needs

Conclusion

DeepSeek V3.1 consolidates thinking and non-thinking modes into a single endpoint, simplifying deployment for reasoning-capable systems. It adds capability over DeepSeek-V3 for agentic and software engineering tasks.

Frequently Asked Questions

  • What does "hybrid inference" mean for DeepSeek V3.1?

    The same model weights support both a thinking mode (extended chain-of-thought) and a non-thinking mode (direct completion). Select the mode by calling deepseek-reasoner for thinking or deepseek-chat for non-thinking. No separate model switch is needed.

  • Is DeepSeek V3.1's thinking mode faster than DeepSeek-R1?

    Yes. DeepSeek-V3.1-Think reaches answers in less time than DeepSeek-R1-0528 on equivalent tasks.

  • Does DeepSeek V3.1 support the Anthropic API format?

    Yes. Existing Anthropic-format integrations can route to DeepSeek V3.1 without additional conversion.

  • What is strict function calling and is it available in DeepSeek V3.1?

    It's in beta for DeepSeek V3.1. Strict function calling requires tool call arguments to match the provided JSON schema exactly.

  • What is the context window for DeepSeek V3.1?

    163.8K tokens for both thinking and non-thinking modes.