Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Qwen3 235B

Alibaba's updated 235B Mixture-of-Experts model delivering top-tier reasoning, coding, and long-context understanding while only activating 22B parameters at a time.

Publisher Qwen
Type Text
Context Window 262,144 tokens
Training Data July 2025
Input $0.15/MTok
Output $0.60/MTok
Provider DeepInfra

235B MoE model activating 22B parameters per token

Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use.

This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.

What Qwen3 235B supports

Long Context Processing

Handles up to 262,144 tokens natively in a single context window, with extended context support available via advanced attention mechanisms.

Instruction Following

Optimized for direct, helpful responses as the non-thinking instruct variant, without exposing internal chain-of-thought output.

Code Generation

Scores 51.8% on LiveCodeBench v6, covering real-world programming tasks across multiple languages.

Mathematical Reasoning

Achieves 70.3% on AIME25 and 41.8% on ARC-AGI, handling multi-step mathematical and logical problem solving.

Knowledge Retrieval

Scores 77.5% on GPQA and 54.3% on SimpleQA, reflecting broad factual knowledge across science and general domains.

Agentic Tool Use

Supports agentic workflows and tool-use scenarios, making it suitable for multi-step task execution and API-integrated pipelines.

Multilingual Text Generation

Generates and understands text across multiple languages, consistent with the broader Qwen3 model family's multilingual training.

MoE Efficient Inference

Uses a Mixture-of-Experts architecture that activates only 22B of 235B parameters per forward pass, reducing per-token compute.

Ready to build with Qwen3 235B?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 76.2%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 61.3%
MATH-500 Undergraduate and competition-level math problems 90.2%
AIME 2024 American math olympiad problems 32.7%
LiveCodeBench Real-world coding tasks from recent competitions 34.3%
HLE Questions that challenge frontier models across many domains 4.7%
SciCode Scientific research coding and numerical methods 29.9%

Common questions about Qwen3 235B

What is the context window for Qwen3 235B?

Qwen3 235B supports a native context window of 262,144 tokens, which is approximately 200,000 words. Extended context beyond this is possible using advanced attention mechanisms.

How many parameters are actually used during inference?

Although the model has 235 billion total parameters, only 22 billion are activated at a time during inference due to its Mixture-of-Experts architecture.

What is the difference between this model and the Thinking variant?

This is the instruct (non-thinking) variant, which produces direct responses without exposing internal chain-of-thought reasoning. The Thinking variant (Qwen3-235B-A22B-Thinking-2507) is a separate model that outputs its reasoning process before answering.

What is the training data cutoff for this model?

Based on the metadata, the training date is listed as July 2025, which corresponds to the 2507 version suffix in the model name.

What license does Qwen3 235B use?

Qwen3 235B is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution subject to the license terms.

What tasks is this model best suited for?

The model is designed for instruction following, agentic workflows, tool use, complex question answering, coding, multilingual tasks, and creative writing. It is not the recommended choice when visible chain-of-thought reasoning is required, as that is handled by the separate Thinking variant.

What people think about Qwen3 235B

Community reception on r/LocalLLaMA has been broadly positive, with the original Qwen3 release thread accumulating nearly 2,000 upvotes and over 430 comments. Users frequently highlight the model's coding and reasoning benchmark scores, as well as the efficiency of its MoE architecture that activates only 22B parameters at inference time.

A notable discussion thread from August 2025 focused on the addition of 1M token context support for the 2507 variants, which generated significant interest among users working with long documents. Some community members also noted the distinction between the instruct and thinking variants, with discussion around which version to use for different task types.

View more discussions →

Parameters & options

Max Temperature 1
Max Response Size 262,144 tokens

Start building with Qwen3 235B

No API keys required. Create AI-powered workflows with Qwen3 235B in minutes — free.