Text Generation Model

Qwen3 235B

Alibaba's updated 235B Mixture-of-Experts model delivering top-tier reasoning, coding, and long-context understanding while only activating 22B parameters at a time.

Start Building with Qwen3 235B View All Models

Publisher

Qwen

Type Text

Context Window 262,144 tokens

Training Data July 2025

Input $0.15/MTok

Output $0.60/MTok

Provider

DeepInfra

Try Qwen3 235B →

About Qwen3 235B

235B MoE model activating 22B parameters per token

Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use.

This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.

Capabilities

What Qwen3 235B supports

Long Context Processing

Handles up to 262,144 tokens natively in a single context window, with extended context support available via advanced attention mechanisms.

Instruction Following

Optimized for direct, helpful responses as the non-thinking instruct variant, without exposing internal chain-of-thought output.

Code Generation

Scores 51.8% on LiveCodeBench v6, covering real-world programming tasks across multiple languages.

Mathematical Reasoning

Achieves 70.3% on AIME25 and 41.8% on ARC-AGI, handling multi-step mathematical and logical problem solving.

Knowledge Retrieval

Scores 77.5% on GPQA and 54.3% on SimpleQA, reflecting broad factual knowledge across science and general domains.

Agentic Tool Use

Supports agentic workflows and tool-use scenarios, making it suitable for multi-step task execution and API-integrated pipelines.

Multilingual Text Generation

Generates and understands text across multiple languages, consistent with the broader Qwen3 model family's multilingual training.

MoE Efficient Inference

Uses a Mixture-of-Experts architecture that activates only 22B of 235B parameters per forward pass, reducing per-token compute.

Ready to build with Qwen3 235B?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	76.2%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	61.3%
MATH-500	Undergraduate and competition-level math problems	90.2%
AIME 2024	American math olympiad problems	32.7%
LiveCodeBench	Real-world coding tasks from recent competitions	34.3%
HLE	Questions that challenge frontier models across many domains	4.7%
SciCode	Scientific research coding and numerical methods	29.9%

FAQ

Common questions about Qwen3 235B

What is the context window for Qwen3 235B?

Qwen3 235B supports a native context window of 262,144 tokens, which is approximately 200,000 words. Extended context beyond this is possible using advanced attention mechanisms.

How many parameters are actually used during inference?

Although the model has 235 billion total parameters, only 22 billion are activated at a time during inference due to its Mixture-of-Experts architecture.

What is the difference between this model and the Thinking variant?

This is the instruct (non-thinking) variant, which produces direct responses without exposing internal chain-of-thought reasoning. The Thinking variant (Qwen3-235B-A22B-Thinking-2507) is a separate model that outputs its reasoning process before answering.

What is the training data cutoff for this model?

Based on the metadata, the training date is listed as July 2025, which corresponds to the 2507 version suffix in the model name.

What license does Qwen3 235B use?

Qwen3 235B is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution subject to the license terms.

What tasks is this model best suited for?

The model is designed for instruction following, agentic workflows, tool use, complex question answering, coding, multilingual tasks, and creative writing. It is not the recommended choice when visible chain-of-thought reasoning is required, as that is handled by the separate Thinking variant.

Community Discussion

What people think about Qwen3 235B

Community reception on r/LocalLLaMA has been broadly positive, with the original Qwen3 release thread accumulating nearly 2,000 upvotes and over 430 comments. Users frequently highlight the model's coding and reasoning benchmark scores, as well as the efficiency of its MoE architecture that activates only 22B parameters at inference time.

A notable discussion thread from August 2025 focused on the addition of 1M token context support for the 2507 variants, which generated significant interest among users working with long documents. Some community members also noted the distinction between the instruct and thinking variants, with discussion around which version to use for different task types.

r/LocalLLaMA 285 pts 32 comments

Qwen added 1M support for Qwen3-30B-A3B-Instruct-2507 and Qwen3-235B-A22B-Instruct-2507

r/LocalLLaMA 1,936 pts 430 comments

Qwen 3 !!!

r/LocalLLaMA 1,112 pts 213 comments

Qwen released Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!

r/LocalLLaMA 141 pts 13 comments

Qwen released Qwen3-235B-A22B-2507!

r/LocalLLaMA 112 pts 14 comments

Qwen/Qwen3-235B-A22B-Thinking-2507

View more discussions →

Resources