Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

GLM 5

GLM-5 is a 744B-parameter open-weight frontier model from Z.ai, built for complex reasoning, coding, and long-horizon agentic tasks — and trained entirely on domestic Chinese hardware.

Publisher Z.ai
Type Text
Context Window 200,000 tokens
Training Data February 2026
Input $0.80/MTok
Output $2.56/MTok
Provider DeepInfra

744B open-weight model for agentic reasoning

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance.

GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

What GLM 5 supports

Long-Context Processing

Handles inputs up to 200,000 tokens in a single context window, enabling analysis of large codebases, documents, or multi-turn conversation histories.

Complex Reasoning

Applies multi-step reasoning across math, science, and logic tasks, scoring 92.7% on AIME 2026 I and 86.0% on GPQA-Diamond benchmarks.

Autonomous Coding

Executes software engineering tasks end-to-end, achieving 77.8% on SWE-bench Verified and 73.3% on SWE-bench Multilingual.

Agentic Task Execution

Supports long-horizon agentic workflows including tool use, web research, and multi-step planning across extended task sequences.

Mixture-of-Experts Architecture

Uses a sparse MoE design with 744B total parameters but only 40B active per token, reducing compute cost per inference call.

Reinforcement Learning Alignment

Post-trained using the asynchronous slime RL infrastructure, which improves training throughput and fine-grained alignment beyond standard pre-training.

Text Generation

Generates structured and unstructured text outputs for tasks including summarization, drafting, and question answering across multiple languages.

Ready to build with GLM 5?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 82.0%
HLE Questions that challenge frontier models across many domains 27.2%
SciCode Scientific research coding and numerical methods 46.2%
SWE-bench Verified Real GitHub issues requiring multi-file code fixes 77.8%
BrowseComp Complex web browsing and information retrieval 75.9%

Common questions about GLM 5

What is the context window for GLM-5?

GLM-5 supports a 200,000-token context window, allowing it to process large documents, long codebases, or extended multi-turn conversations in a single pass.

How many parameters does GLM-5 have?

GLM-5 is a Mixture-of-Experts model with 744 billion total parameters. It activates 40 billion parameters per token during inference, which reduces the compute cost relative to a dense model of the same total size.

What is the training data cutoff for GLM-5?

Based on the available metadata, GLM-5 has a training date of February 2026. A precise knowledge cutoff date is not specified in the provided metadata.

What license does GLM-5 use?

GLM-5 is released under the MIT license, which permits both research and commercial use without royalty obligations.

What hardware was GLM-5 trained on?

GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework. It has no dependency on NVIDIA hardware, making it notable as a large-scale model trained on China's domestic AI compute infrastructure.

What tasks is GLM-5 best suited for?

GLM-5 is designed for agentic workflows, autonomous software engineering, tool use, web research, and long-horizon planning tasks. It also performs well on advanced mathematics and graduate-level science reasoning based on its benchmark results.

What people think about GLM 5

Community reception on r/LocalLLaMA was broadly positive at launch, with users highlighting GLM-5's strong benchmark scores in software engineering and math reasoning as well as its MIT license enabling open commercial use. The thread about Z.ai's GPU constraints attracted significant attention, with many users noting the significance of training a model of this scale entirely on Huawei Ascend hardware.

Some community members raised questions about real-world performance relative to benchmark numbers, and a later thread on r/singularity pointed to GLM-5's ARC-AGI 2 results as underwhelming compared to its other reported scores. Discussions also covered availability on platforms like OpenRouter ahead of the official release.

View more discussions →

Parameters & options

Max Temperature 1
Max Response Size 16,384 tokens
Reasoning Effort Toggle Group
Default: medium

Start building with GLM 5

No API keys required. Create AI-powered workflows with GLM 5 in minutes — free.