Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

GLM 4.6

GLM-4.6 is a powerful 357B Mixture-of-Experts language model from Zhipu AI featuring a 200K context window, advanced reasoning, and top-tier coding and agentic capabilities.

Publisher Z.ai
Type Text
Context Window 200,000 tokens
Training Data September 2025
Input $0.43/MTok
Output $1.74/MTok
Provider DeepInfra

357B MoE model with 200K context and tool-use reasoning

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series.

GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

What GLM 4.6 supports

Extended Context Window

Processes up to 200,000 tokens in a single request, equivalent to roughly 150,000 words, enabling analysis of long documents and large codebases without losing earlier context.

Tool-Use Reasoning

Supports tool calling during the reasoning process itself, allowing the model to query APIs or search for information while thinking through a problem rather than only after.

Code Generation

Handles real-world programming tasks including front-end web page generation and integrates with coding tools such as Claude Code, Cline, Roo Code, and Kilo Code.

Agentic Workflows

Built for multi-step agent pipelines, performing well on tool-use benchmarks and integrating into agent frameworks for automated task execution.

Bilingual Language Support

Natively supports both English and Chinese, making it suitable for bilingual applications and cross-language document processing.

Long-Form Text Generation

Produces extended written content and handles role-playing scenarios, with outputs tuned toward human-preferred writing style and coherence.

MoE Architecture

Uses a Mixture-of-Experts design with approximately 357 billion total parameters, allowing selective activation of model capacity per token during inference.

Ready to build with GLM 4.6?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 78.4%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 63.2%
LiveCodeBench Real-world coding tasks from recent competitions 56.1%
HLE Questions that challenge frontier models across many domains 5.2%
SciCode Scientific research coding and numerical methods 33.1%

Common questions about GLM 4.6

What is the context window size for GLM-4.6?

GLM-4.6 supports a context window of 200,000 tokens, which is approximately 150,000 words. This allows it to process long documents, large codebases, or extended conversation histories in a single request.

What license does GLM-4.6 use?

GLM-4.6 is released under the MIT license, which permits free use for both commercial and personal projects without royalty obligations.

What is the knowledge cutoff date for GLM-4.6?

According to the model metadata, GLM-4.6 has a training data cutoff of September 2025.

How many parameters does GLM-4.6 have?

GLM-4.6 is built on a Mixture-of-Experts architecture with approximately 357 billion total parameters. MoE models activate only a subset of parameters per token during inference.

What languages does GLM-4.6 support?

GLM-4.6 natively supports both English and Chinese, making it suitable for bilingual use cases and applications targeting users in either language.

What kinds of tasks is GLM-4.6 best suited for?

GLM-4.6 is designed for complex coding tasks, long-document analysis, agentic AI workflows that require tool use during reasoning, and bilingual English/Chinese applications.

What people think about GLM 4.6

Community reception on r/LocalLLaMA has been broadly positive, with the GLM-4.6 announcement post receiving over 400 upvotes and 81 comments. Users have highlighted its large context window, open-weight availability under the MIT license, and performance on coding and agentic tasks as notable strengths.

A separate thread about the GLM-4.6-Air variant attracted over 600 upvotes, suggesting interest in lighter deployable versions of the model. Discussions also reference the subsequent GLM-4.7 release, indicating an active development cadence that some users follow closely for local deployment use cases.

View more discussions →

Parameters & options

Max Temperature 1
Max Response Size 16,384 tokens
Reasoning Effort Toggle Group
Default: medium

Start building with GLM 4.6

No API keys required. Create AI-powered workflows with GLM 4.6 in minutes — free.