Text Generation Model

GLM 5

GLM-5 is a 744B-parameter open-weight frontier model from Z.ai, built for complex reasoning, coding, and long-horizon agentic tasks — and trained entirely on domestic Chinese hardware.

Start Building with GLM 5 View All Models

Publisher

Z.ai

Type Text

Context Window 200,000 tokens

Training Data February 2026

Input $0.80/MTok

Output $2.56/MTok

Provider

DeepInfra

Try GLM 5 →

About GLM 5

744B open-weight model for agentic reasoning

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance.

GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

Capabilities

What GLM 5 supports

Long-Context Processing

Handles inputs up to 200,000 tokens in a single context window, enabling analysis of large codebases, documents, or multi-turn conversation histories.

Complex Reasoning

Applies multi-step reasoning across math, science, and logic tasks, scoring 92.7% on AIME 2026 I and 86.0% on GPQA-Diamond benchmarks.

Autonomous Coding

Executes software engineering tasks end-to-end, achieving 77.8% on SWE-bench Verified and 73.3% on SWE-bench Multilingual.

Agentic Task Execution

Supports long-horizon agentic workflows including tool use, web research, and multi-step planning across extended task sequences.

Mixture-of-Experts Architecture

Uses a sparse MoE design with 744B total parameters but only 40B active per token, reducing compute cost per inference call.

Reinforcement Learning Alignment

Post-trained using the asynchronous slime RL infrastructure, which improves training throughput and fine-grained alignment beyond standard pre-training.

Text Generation

Generates structured and unstructured text outputs for tasks including summarization, drafting, and question answering across multiple languages.

Ready to build with GLM 5?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	82.0%
HLE	Questions that challenge frontier models across many domains	27.2%
SciCode	Scientific research coding and numerical methods	46.2%
SWE-bench Verified	Real GitHub issues requiring multi-file code fixes	77.8%
BrowseComp	Complex web browsing and information retrieval	75.9%

FAQ

Common questions about GLM 5

What is the context window for GLM-5?

GLM-5 supports a 200,000-token context window, allowing it to process large documents, long codebases, or extended multi-turn conversations in a single pass.

How many parameters does GLM-5 have?

GLM-5 is a Mixture-of-Experts model with 744 billion total parameters. It activates 40 billion parameters per token during inference, which reduces the compute cost relative to a dense model of the same total size.

What is the training data cutoff for GLM-5?

Based on the available metadata, GLM-5 has a training date of February 2026. A precise knowledge cutoff date is not specified in the provided metadata.

What license does GLM-5 use?

GLM-5 is released under the MIT license, which permits both research and commercial use without royalty obligations.

What hardware was GLM-5 trained on?

GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework. It has no dependency on NVIDIA hardware, making it notable as a large-scale model trained on China's domestic AI compute infrastructure.

What tasks is GLM-5 best suited for?

GLM-5 is designed for agentic workflows, autonomous software engineering, tool use, web research, and long-horizon planning tasks. It also performs well on advanced mathematics and graduate-level science reasoning based on its benchmark results.

Community Discussion

What people think about GLM 5

Community reception on r/LocalLLaMA was broadly positive at launch, with users highlighting GLM-5's strong benchmark scores in software engineering and math reasoning as well as its MIT license enabling open commercial use. The thread about Z.ai's GPU constraints attracted significant attention, with many users noting the significance of training a model of this scale entirely on Huawei Ascend hardware.

Some community members raised questions about real-world performance relative to benchmark numbers, and a later thread on r/singularity pointed to GLM-5's ARC-AGI 2 results as underwhelming compared to its other reported scores. Discussions also covered availability on platforms like OpenRouter ahead of the official release.

r/LocalLLaMA 1,589 pts 249 comments

Z.ai said they are GPU starved, openly.

r/LocalLLaMA 809 pts 159 comments

GLM-5 Officially Released

r/LocalLLaMA 289 pts 83 comments

GLM 5 Is Being Tested On OpenRouter

r/singularity 205 pts 76 comments

Chinese models' ARC-AGI 2 results seem underwhelming compared to their benchmarks results

View more discussions →

Resources