Skip to content

GLM 4.7

GLM 4.7 is Z.ai's model released December 22, 2025 with major improvements in coding, tool usage, and multi-step reasoning. It uses a more natural conversational tone and shows improved frontend development results.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.7',
prompt: 'Why is the sky blue?'
})

Playground

Try out GLM 4.7 by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Z.ai
Legal:Terms
Privacy
200K
6.9s
74tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
12/22/2025
Novita AI
Legal:Terms
Privacy
205K
1.5s
78tps
$0.60/M$2.20/M
Read:$0.1/M
Write:
12/22/2025
DeepInfra
Legal:Terms
Privacy
203K
0.6s
22tps
$0.43/M$1.75/M
Read:$0.08/M
Write:
12/22/2025
Cerebras
Legal:Terms
Privacy
131K
0.1s
665tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
12/22/2025
Amazon Bedrock
Legal:Terms
Privacy
200K
0.4s
47tps
$0.60/M$2.20/M
12/22/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
1.3s
46tps
$1.40/M$4.40/M
Read:$0.26/M
Write:
deepinfra logo
fireworks logo
novita logo
+1
04/07/2026
200K
1.1s
47tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
04/01/2026
203K
1.0s
20tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
03/15/2026
203K
0.4s
61tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
bedrock logo
deepinfra logo
fireworks logo
+3
02/12/2026
205K
0.4s
105tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
baseten logo
deepinfra logo
novita logo
+1
09/30/2025
200K
0.1s
$0.07/M$0.40/M
Read:$0.01/M
Write:
bedrock logo
zai logo

About GLM 4.7

GLM 4.7 was released December 22, 2025 as a capability upgrade in Z.ai's model lineup. It brings major improvements in coding, tool usage, and multi-step reasoning, with added focus on complex agentic tasks that require sustained planning across multiple steps.

The model introduces a more natural conversational tone compared to earlier GLM generations. This improves the experience in chat-based applications and interactive coding assistants. GLM 4.7 shows improved frontend development results for teams building UI components, web applications, and design-to-code workflows.

GLM 4.7 operates within a context window of 204.8K tokens and is available through AI Gateway with the standard unified API, built-in observability, and intelligent provider routing. It represents the full-scale offering in the 4.7 generation, complemented by GLM-4.7-Flash and GLM-4.7-FlashX for speed-optimized workloads.

What To Consider When Choosing a Provider

  • Configuration: GLM 4.7 targets frontend tasks. If your workload involves generating React components, CSS, or converting designs to code, benchmark it against general-purpose alternatives you already use.
  • Configuration: The 4.7 generation includes three tiers. GLM 4.7 provides maximum capability, GLM-4.7-Flash offers speed optimization, and GLM-4.7-FlashX provides the fastest inference. Choose based on your latency-capability tradeoff.
  • Configuration: Multi-step reasoning updates make GLM 4.7 more reliable for agentic pipelines. Test it against your existing agent benchmarks to quantify the improvement over GLM-4.6.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 4.7

Best For

  • Frontend development: HTML, CSS, React components, and design-to-code conversion benefit from the targeted improvements
  • Complex agentic tasks: Multi-step reasoning, tool usage, and sustained planning across extended interactions
  • Interactive coding assistants: The natural conversational tone improves developer experience
  • Full-stack code generation: Improvements in both coding capability and tool usage apply across the stack
  • Production applications: The highest capability tier in the GLM-4.7 generation, paired with AI Gateway observability

Consider Alternatives When

  • Latency-driven workloads: GLM-4.7-Flash or GLM-4.7-FlashX provides faster inference at reduced capability
  • Vision capabilities needed: Evaluate GLM-4.6V or GLM-4.5V for multimodal visual input
  • Advanced reasoning beyond 4.7: GLM-5 introduces multiple thinking modes and improved long-range planning
  • Cost-efficiency priority: The flash variants in the 4.7 generation or GLM-4.5-Air may be more economical when peak capability is not essential

Conclusion

GLM 4.7 advances Z.ai's model lineup with targeted improvements in the areas that matter most for modern development workflows: coding, tool usage, multi-step reasoning, and frontend generation. As the full-scale 4.7 model, it sets the capability ceiling for the generation while GLM-4.7-Flash and FlashX variants serve speed-sensitive workloads.

Frequently Asked Questions

  • What are the main improvements in GLM 4.7 over previous GLM models?

    Coding, tool usage, and multi-step reasoning. It also uses a more natural conversational tone and shows improved frontend development results in this generation.

  • What is the difference between GLM-4.7, GLM-4.7-Flash, and GLM-4.7-FlashX?

    GLM-4.7 is the full-scale variant with maximum capability. GLM-4.7-Flash is optimized for faster inference with reduced capability. GLM-4.7-FlashX provides the fastest inference tier in the generation. All share the same API surface.

  • Is GLM 4.7 good for frontend development?

    Yes. Z.ai cites improved frontend development results as a key change in this model generation.

  • What is the context window for GLM 4.7?

    204.8K tokens.

  • How do I authenticate with GLM 4.7 through AI Gateway?

    AI Gateway provides a unified API key. Configure it in your environment and use the model identifier to route requests. No separate Z.ai account is required, though BYOK is also supported.

  • How does GLM 4.7 handle multi-step agentic tasks?

    Multi-step reasoning is a core improvement in this generation. The model maintains better coherence across extended tool-use sequences and planning steps compared to earlier GLM models.

  • What providers serve GLM 4.7 through AI Gateway?

    GLM 4.7 is available through zai, novita, deepinfra, cerebras, bedrock. AI Gateway handles routing and automatic retries across providers.