Skip to content

GLM 4.6

GLM 4.6 is Z.ai's coding-focused model released September 30, 2025, with enhanced performance on both benchmarks and real-world programming tasks. It features an expanded context window of 204.8K tokens for handling large codebases and complex agent workflows.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.6',
prompt: 'Why is the sky blue?'
})

Playground

Try out GLM 4.6 by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Z.ai
Legal:Terms
Privacy
200K
7.4s
105tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
09/30/2025
DeepInfra
Legal:Terms
Privacy
203K
0.4s
28tps
$0.45/M$1.90/M
Read:$0.11/M
Write:
09/30/2025
Baseten
Legal:Terms
Privacy
200K
$0.60/M$2.20/M
09/30/2025
Novita AI
Legal:Terms
Privacy
205K
2.0s
74tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
09/30/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
1.3s
46tps
$1.40/M$4.40/M
Read:$0.26/M
Write:
deepinfra logo
fireworks logo
novita logo
+1
04/07/2026
200K
1.1s
47tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
04/01/2026
203K
1.0s
20tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
03/15/2026
203K
0.4s
61tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
bedrock logo
deepinfra logo
fireworks logo
+3
02/12/2026
205K
0.1s
665tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
12/22/2025
200K
0.1s
$0.07/M$0.40/M
Read:$0.01/M
Write:
bedrock logo
zai logo

About GLM 4.6

GLM 4.6 was released September 30, 2025 as Z.ai's dedicated coding model. It builds on the GLM-4.5 foundation with targeted improvements for software engineering workflows, benchmark performance, and real-world programming tasks.

The key architectural change is an expanded context window of 204.8K tokens. You can process entire codebases, long specification documents, and multi-file analysis in a single request. This benefits code generation tasks that require understanding cross-file relationships, and agentic coding workflows that maintain state across extended interactions.

GLM 4.6 shows enhanced performance on both public benchmarks and real-world programming tasks. Benchmark scores predict capability, but real-world coding involves ambiguous requirements, legacy code patterns, and iterative refinement. GLM 4.6 targets both dimensions.

What To Consider When Choosing a Provider

  • Configuration: The context window of 204.8K tokens handles large codebases in a single pass. Structure your prompts to include relevant file context rather than relying on the model to infer missing dependencies.
  • Configuration: GLM 4.6 is optimized for code generation and understanding. For general reasoning or conversational tasks, GLM-4.5 may provide a more balanced profile.
  • Configuration: Coding tasks with large context inputs consume many tokens. Monitor usage through AI Gateway's built-in observability to track actual costs against estimates.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 4.6

Best For

  • Software engineering workflows: Code generation, debugging, refactoring, and code review across large repositories
  • Agentic coding tasks: Extended context and multi-step planning improve the quality of generated solutions
  • Large codebase analysis: The context window of 204.8K tokens fits cross-file dependencies and architectural patterns
  • Code migration and modernization: Understanding legacy patterns and generating updated code requires broad context
  • Technical documentation generation: Codebases where the model must read and synthesize large amounts of source code

Consider Alternatives When

  • General-purpose workloads: GLM-4.5 provides broader capability without the coding specialization for reasoning or conversation
  • Vision-enabled coding: GLM-4.6V combines vision input with coding capability for code-from-screenshot workflows
  • Faster inference priority: GLM-4.6V-Flash offers vision and coding at reduced latency when speed matters more than depth
  • Advanced coding improvements: GLM-4.7 includes further advancements in tool usage and multi-step reasoning for complex agentic tasks

Conclusion

GLM 4.6 targets the coding workload specifically, combining an expanded context window of 204.8K tokens with improvements in both benchmark and real-world programming performance. For teams building coding assistants, automated code review pipelines, or agentic development tools, it provides a focused alternative to general-purpose models.

Frequently Asked Questions

  • What makes GLM 4.6 different from GLM-4.5?

    GLM 4.6 is specifically optimized for coding tasks with an expanded context window of 204.8K tokens and targeted improvements in programming benchmark and real-world coding performance. GLM-4.5 is the general-purpose model.

  • What is the context window for GLM 4.6?

    204.8K tokens, designed to handle large codebases, long specification documents, and multi-file analysis in a single request.

  • Can GLM 4.6 handle multi-file code analysis?

    Yes. The expanded context window lets you include multiple files in a single request, enabling the model to understand cross-file dependencies, imports, and architectural patterns.

  • How do I authenticate with GLM 4.6 through AI Gateway?

    AI Gateway provides a unified API key. Configure it in your environment and specify the model identifier. No separate Z.ai account is required, though BYOK is supported.

  • How does GLM 4.6 compare to GLM-4.7 for coding?

    GLM 4.6 introduced the coding-focused improvements in the GLM lineup. GLM-4.7 adds further gains in tool usage, multi-step reasoning, and frontend development, per Z.ai's release notes.

  • Is GLM 4.6 suitable for non-coding tasks?

    GLM 4.6 retains general language capability but is optimized for code. For conversational, reasoning, or general-purpose tasks, GLM-4.5 or GLM-5 may be more appropriate.

  • What is the pricing for GLM 4.6?

    Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.