Qwen3 235B
Alibaba's updated 235B Mixture-of-Experts model delivering top-tier reasoning, coding, and long-context understanding while only activating 22B parameters at a time.
235B MoE model activating 22B parameters per token
Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use.
This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.
What Qwen3 235B supports
Long Context Processing
Handles up to 262,144 tokens natively in a single context window, with extended context support available via advanced attention mechanisms.
Instruction Following
Optimized for direct, helpful responses as the non-thinking instruct variant, without exposing internal chain-of-thought output.
Code Generation
Scores 51.8% on LiveCodeBench v6, covering real-world programming tasks across multiple languages.
Mathematical Reasoning
Achieves 70.3% on AIME25 and 41.8% on ARC-AGI, handling multi-step mathematical and logical problem solving.
Knowledge Retrieval
Scores 77.5% on GPQA and 54.3% on SimpleQA, reflecting broad factual knowledge across science and general domains.
Agentic Tool Use
Supports agentic workflows and tool-use scenarios, making it suitable for multi-step task execution and API-integrated pipelines.
Multilingual Text Generation
Generates and understands text across multiple languages, consistent with the broader Qwen3 model family's multilingual training.
MoE Efficient Inference
Uses a Mixture-of-Experts architecture that activates only 22B of 235B parameters per forward pass, reducing per-token compute.
Ready to build with Qwen3 235B?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 76.2% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 61.3% |
| MATH-500 | Undergraduate and competition-level math problems | 90.2% |
| AIME 2024 | American math olympiad problems | 32.7% |
| LiveCodeBench | Real-world coding tasks from recent competitions | 34.3% |
| HLE | Questions that challenge frontier models across many domains | 4.7% |
| SciCode | Scientific research coding and numerical methods | 29.9% |
Common questions about Qwen3 235B
What is the context window for Qwen3 235B?
Qwen3 235B supports a native context window of 262,144 tokens, which is approximately 200,000 words. Extended context beyond this is possible using advanced attention mechanisms.
How many parameters are actually used during inference?
Although the model has 235 billion total parameters, only 22 billion are activated at a time during inference due to its Mixture-of-Experts architecture.
What is the difference between this model and the Thinking variant?
This is the instruct (non-thinking) variant, which produces direct responses without exposing internal chain-of-thought reasoning. The Thinking variant (Qwen3-235B-A22B-Thinking-2507) is a separate model that outputs its reasoning process before answering.
What is the training data cutoff for this model?
Based on the metadata, the training date is listed as July 2025, which corresponds to the 2507 version suffix in the model name.
What license does Qwen3 235B use?
Qwen3 235B is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution subject to the license terms.
What tasks is this model best suited for?
The model is designed for instruction following, agentic workflows, tool use, complex question answering, coding, multilingual tasks, and creative writing. It is not the recommended choice when visible chain-of-thought reasoning is required, as that is handled by the separate Thinking variant.
What people think about Qwen3 235B
Community reception on r/LocalLLaMA has been broadly positive, with the original Qwen3 release thread accumulating nearly 2,000 upvotes and over 430 comments. Users frequently highlight the model's coding and reasoning benchmark scores, as well as the efficiency of its MoE architecture that activates only 22B parameters at inference time.
A notable discussion thread from August 2025 focused on the addition of 1M token context support for the 2507 variants, which generated significant interest among users working with long documents. Some community members also noted the distinction between the instruct and thinking variants, with discussion around which version to use for different task types.
Qwen added 1M support for Qwen3-30B-A3B-Instruct-2507 and Qwen3-235B-A22B-Instruct-2507
Qwen 3 !!!
Qwen released Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!
Qwen released Qwen3-235B-A22B-2507!
Qwen/Qwen3-235B-A22B-Thinking-2507
Parameters & options
Explore similar models
Start building with Qwen3 235B
No API keys required. Create AI-powered workflows with Qwen3 235B in minutes — free.