Text Generation Model

GLM 4.6

GLM-4.6 is a powerful 357B Mixture-of-Experts language model from Zhipu AI featuring a 200K context window, advanced reasoning, and top-tier coding and agentic capabilities.

Start Building with GLM 4.6 View All Models

Publisher

Z.ai

Type Text

Context Window 200,000 tokens

Training Data September 2025

Input $0.43/MTok

Output $1.74/MTok

Provider

DeepInfra

Try GLM 4.6 →

About GLM 4.6

357B MoE model with 200K context and tool-use reasoning

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series.

GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

Capabilities

What GLM 4.6 supports

Extended Context Window

Processes up to 200,000 tokens in a single request, equivalent to roughly 150,000 words, enabling analysis of long documents and large codebases without losing earlier context.

Tool-Use Reasoning

Supports tool calling during the reasoning process itself, allowing the model to query APIs or search for information while thinking through a problem rather than only after.

Code Generation

Handles real-world programming tasks including front-end web page generation and integrates with coding tools such as Claude Code, Cline, Roo Code, and Kilo Code.

Agentic Workflows

Built for multi-step agent pipelines, performing well on tool-use benchmarks and integrating into agent frameworks for automated task execution.

Bilingual Language Support

Natively supports both English and Chinese, making it suitable for bilingual applications and cross-language document processing.

Long-Form Text Generation

Produces extended written content and handles role-playing scenarios, with outputs tuned toward human-preferred writing style and coherence.

MoE Architecture

Uses a Mixture-of-Experts design with approximately 357 billion total parameters, allowing selective activation of model capacity per token during inference.

Ready to build with GLM 4.6?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	78.4%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	63.2%
LiveCodeBench	Real-world coding tasks from recent competitions	56.1%
HLE	Questions that challenge frontier models across many domains	5.2%
SciCode	Scientific research coding and numerical methods	33.1%

FAQ

Common questions about GLM 4.6

What is the context window size for GLM-4.6?

GLM-4.6 supports a context window of 200,000 tokens, which is approximately 150,000 words. This allows it to process long documents, large codebases, or extended conversation histories in a single request.

What license does GLM-4.6 use?

GLM-4.6 is released under the MIT license, which permits free use for both commercial and personal projects without royalty obligations.

What is the knowledge cutoff date for GLM-4.6?

According to the model metadata, GLM-4.6 has a training data cutoff of September 2025.

How many parameters does GLM-4.6 have?

GLM-4.6 is built on a Mixture-of-Experts architecture with approximately 357 billion total parameters. MoE models activate only a subset of parameters per token during inference.

What languages does GLM-4.6 support?

GLM-4.6 natively supports both English and Chinese, making it suitable for bilingual use cases and applications targeting users in either language.

What kinds of tasks is GLM-4.6 best suited for?

GLM-4.6 is designed for complex coding tasks, long-document analysis, agentic AI workflows that require tool use during reasoning, and bilingual English/Chinese applications.

Community Discussion

What people think about GLM 4.6

Community reception on r/LocalLLaMA has been broadly positive, with the GLM-4.6 announcement post receiving over 400 upvotes and 81 comments. Users have highlighted its large context window, open-weight availability under the MIT license, and performance on coding and agentic tasks as notable strengths.

A separate thread about the GLM-4.6-Air variant attracted over 600 upvotes, suggesting interest in lighter deployable versions of the model. Discussions also reference the subsequent GLM-4.7 release, indicating an active development cadence that some users follow closely for local deployment use cases.

r/LocalLLaMA 419 pts 81 comments

zai-org/GLM-4.6 · Hugging Face

r/LocalLLaMA 337 pts 95 comments

GLM 4.7 released!

r/LocalLLaMA 602 pts 53 comments

GLM-4.6-Air is not forgotten!

View more discussions →

Resources