GLM 5
GLM-5 is a 744B-parameter open-weight frontier model from Z.ai, built for complex reasoning, coding, and long-horizon agentic tasks — and trained entirely on domestic Chinese hardware.
744B open-weight model for agentic reasoning
GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance.
GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.
What GLM 5 supports
Long-Context Processing
Handles inputs up to 200,000 tokens in a single context window, enabling analysis of large codebases, documents, or multi-turn conversation histories.
Complex Reasoning
Applies multi-step reasoning across math, science, and logic tasks, scoring 92.7% on AIME 2026 I and 86.0% on GPQA-Diamond benchmarks.
Autonomous Coding
Executes software engineering tasks end-to-end, achieving 77.8% on SWE-bench Verified and 73.3% on SWE-bench Multilingual.
Agentic Task Execution
Supports long-horizon agentic workflows including tool use, web research, and multi-step planning across extended task sequences.
Mixture-of-Experts Architecture
Uses a sparse MoE design with 744B total parameters but only 40B active per token, reducing compute cost per inference call.
Reinforcement Learning Alignment
Post-trained using the asynchronous slime RL infrastructure, which improves training throughput and fine-grained alignment beyond standard pre-training.
Text Generation
Generates structured and unstructured text outputs for tasks including summarization, drafting, and question answering across multiple languages.
Ready to build with GLM 5?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 82.0% |
| HLE | Questions that challenge frontier models across many domains | 27.2% |
| SciCode | Scientific research coding and numerical methods | 46.2% |
| SWE-bench Verified | Real GitHub issues requiring multi-file code fixes | 77.8% |
| BrowseComp | Complex web browsing and information retrieval | 75.9% |
Common questions about GLM 5
What is the context window for GLM-5?
GLM-5 supports a 200,000-token context window, allowing it to process large documents, long codebases, or extended multi-turn conversations in a single pass.
How many parameters does GLM-5 have?
GLM-5 is a Mixture-of-Experts model with 744 billion total parameters. It activates 40 billion parameters per token during inference, which reduces the compute cost relative to a dense model of the same total size.
What is the training data cutoff for GLM-5?
Based on the available metadata, GLM-5 has a training date of February 2026. A precise knowledge cutoff date is not specified in the provided metadata.
What license does GLM-5 use?
GLM-5 is released under the MIT license, which permits both research and commercial use without royalty obligations.
What hardware was GLM-5 trained on?
GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework. It has no dependency on NVIDIA hardware, making it notable as a large-scale model trained on China's domestic AI compute infrastructure.
What tasks is GLM-5 best suited for?
GLM-5 is designed for agentic workflows, autonomous software engineering, tool use, web research, and long-horizon planning tasks. It also performs well on advanced mathematics and graduate-level science reasoning based on its benchmark results.
What people think about GLM 5
Community reception on r/LocalLLaMA was broadly positive at launch, with users highlighting GLM-5's strong benchmark scores in software engineering and math reasoning as well as its MIT license enabling open commercial use. The thread about Z.ai's GPU constraints attracted significant attention, with many users noting the significance of training a model of this scale entirely on Huawei Ascend hardware.
Some community members raised questions about real-world performance relative to benchmark numbers, and a later thread on r/singularity pointed to GLM-5's ARC-AGI 2 results as underwhelming compared to its other reported scores. Discussions also covered availability on platforms like OpenRouter ahead of the official release.
Z.ai said they are GPU starved, openly.
GLM-5 Officially Released
GLM 5 Is Being Tested On OpenRouter
Chinese models' ARC-AGI 2 results seem underwhelming compared to their benchmarks results
Parameters & options
Explore similar models
Start building with GLM 5
No API keys required. Create AI-powered workflows with GLM 5 in minutes — free.