What is the difference between GLM-4.7, GLM-4.7-Flash, and GLM-4.7-FlashX?

GLM-4.7 is the full-scale variant with maximum capability. GLM-4.7-Flash is optimized for faster inference with reduced capability. GLM-4.7-FlashX provides the fastest inference tier in the generation. All share the same API surface.

Is GLM 4.7 good for frontend development?

Yes. Z.ai cites improved frontend development results as a key change in this model generation.

How do I authenticate with GLM 4.7 through AI Gateway?

AI Gateway provides a unified API key. Configure it in your environment and use the model identifier to route requests. No separate Z.ai account is required, though BYOK is also supported.

How does GLM 4.7 handle multi-step agentic tasks?

Multi-step reasoning is a core improvement in this generation. The model maintains better coherence across extended tool-use sequences and planning steps compared to earlier GLM models.

What providers serve GLM 4.7 through AI Gateway?

GLM 4.7 is available through zai, novita, deepinfra, cerebras, bedrock. AI Gateway handles routing and automatic retries across providers.

Dashboard

GLM 4.7

GLM 4.7 is Z.ai's model released December 22, 2025 with major improvements in coding, tool usage, and multi-step reasoning. It uses a more natural conversational tone and shows improved frontend development results.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'zai/glm-4.7',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out GLM 4.7 by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

200K

6.9s

74tps

$0.60/M

$2.20/M

Read:$0.11/M

Write:—

—

12/22/2025

Legal:Terms

•

Privacy

205K

1.5s

78tps

$0.60/M

$2.20/M

Read:$0.1/M

Write:—

—

12/22/2025

Legal:Terms

•

Privacy

203K

0.6s

22tps

$0.43/M

$1.75/M

Read:$0.08/M

Write:—

—

12/22/2025

Legal:Terms

•

Privacy

131K

0.1s

665tps

$2.25/M

$2.75/M

Read:$2.25/M

Write:—

—

12/22/2025

Legal:Terms

•

Privacy

200K

0.4s

47tps

$0.60/M

$2.20/M

—

12/22/2025

More models by Z.ai

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

205K

1.3s

46tps

$1.40/M

$4.40/M

Read:$0.26/M

Write:—

—

04/07/2026

200K

1.1s

47tps

$1.20/M

$4.00/M

Read:$0.24/M

Write:—

—

04/01/2026

203K

1.0s

20tps

$1.20/M

$4.00/M

Read:$0.24/M

Write:—

—

03/15/2026

203K

0.4s

61tps

$0.80/M

$2.56/M

Read:$0.16/M

Write:—

—

02/12/2026

205K

0.4s

105tps

$0.60/M

$2.20/M

Read:$0.11/M

Write:—

—

09/30/2025

200K

0.1s

$0.07/M

$0.40/M

Read:$0.01/M

Write:—

—

About GLM 4.7

GLM 4.7 was released December 22, 2025 as a capability upgrade in Z.ai's model lineup. It brings major improvements in coding, tool usage, and multi-step reasoning, with added focus on complex agentic tasks that require sustained planning across multiple steps.

The model introduces a more natural conversational tone compared to earlier GLM generations. This improves the experience in chat-based applications and interactive coding assistants. GLM 4.7 shows improved frontend development results for teams building UI components, web applications, and design-to-code workflows.

GLM 4.7 operates within a context window of 204.8K tokens and is available through AI Gateway with the standard unified API, built-in observability, and intelligent provider routing. It represents the full-scale offering in the 4.7 generation, complemented by GLM-4.7-Flash and GLM-4.7-FlashX for speed-optimized workloads.

What To Consider When Choosing a Provider

Configuration: GLM 4.7 targets frontend tasks. If your workload involves generating React components, CSS, or converting designs to code, benchmark it against general-purpose alternatives you already use.
Configuration: The 4.7 generation includes three tiers. GLM 4.7 provides maximum capability, GLM-4.7-Flash offers speed optimization, and GLM-4.7-FlashX provides the fastest inference. Choose based on your latency-capability tradeoff.
Configuration: Multi-step reasoning updates make GLM 4.7 more reliable for agentic pipelines. Test it against your existing agent benchmarks to quantify the improvement over GLM-4.6.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 4.7

Best For

Frontend development: HTML, CSS, React components, and design-to-code conversion benefit from the targeted improvements
Complex agentic tasks: Multi-step reasoning, tool usage, and sustained planning across extended interactions
Interactive coding assistants: The natural conversational tone improves developer experience
Full-stack code generation: Improvements in both coding capability and tool usage apply across the stack
Production applications: The highest capability tier in the GLM-4.7 generation, paired with AI Gateway observability

Consider Alternatives When

Latency-driven workloads: GLM-4.7-Flash or GLM-4.7-FlashX provides faster inference at reduced capability
Vision capabilities needed: Evaluate GLM-4.6V or GLM-4.5V for multimodal visual input
Advanced reasoning beyond 4.7: GLM-5 introduces multiple thinking modes and improved long-range planning
Cost-efficiency priority: The flash variants in the 4.7 generation or GLM-4.5-Air may be more economical when peak capability is not essential

Conclusion

GLM 4.7 advances Z.ai's model lineup with targeted improvements in the areas that matter most for modern development workflows: coding, tool usage, multi-step reasoning, and frontend generation. As the full-scale 4.7 model, it sets the capability ceiling for the generation while GLM-4.7-Flash and FlashX variants serve speed-sensitive workloads.

Frequently Asked Questions

What are the main improvements in GLM 4.7 over previous GLM models?
Coding, tool usage, and multi-step reasoning. It also uses a more natural conversational tone and shows improved frontend development results in this generation.
What is the difference between GLM-4.7, GLM-4.7-Flash, and GLM-4.7-FlashX?
GLM-4.7 is the full-scale variant with maximum capability. GLM-4.7-Flash is optimized for faster inference with reduced capability. GLM-4.7-FlashX provides the fastest inference tier in the generation. All share the same API surface.
Is GLM 4.7 good for frontend development?
Yes. Z.ai cites improved frontend development results as a key change in this model generation.
What is the context window for GLM 4.7?
204.8K tokens.
How do I authenticate with GLM 4.7 through AI Gateway?
AI Gateway provides a unified API key. Configure it in your environment and use the model identifier to route requests. No separate Z.ai account is required, though BYOK is also supported.
How does GLM 4.7 handle multi-step agentic tasks?
Multi-step reasoning is a core improvement in this generation. The model maintains better coherence across extended tool-use sequences and planning steps compared to earlier GLM models.
What providers serve GLM 4.7 through AI Gateway?
GLM 4.7 is available through zai, novita, deepinfra, cerebras, bedrock. AI Gateway handles routing and automatic retries across providers.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GLM 4.7

Playground

Providers

More models by Z.ai

About GLM 4.7

What To Consider When Choosing a Provider

When to Use GLM 4.7

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions