Text Generation Model

Nemotron 3 Super 120B

NVIDIA's open-weight hybrid Mamba-Transformer MoE with 120B total / 12B active parameters, excelling in agentic reasoning, coding, and long-context tasks up to 1M tokens.

Start Building with Nemotron 3 Super 120B View All Models

Publisher

Nvidia

Type Text

Context Window 1,000,000 tokens

Training Data March 2026

Input $0.10/MTok

Output $0.50/MTok

Provider

DeepInfra

Try Nemotron 3 Super 120B →

About Nemotron 3 Super 120B

Hybrid MoE with 1M token context window

Nemotron 3 Super 120B is an open-weight large language model released by NVIDIA in March 2026. It uses a hybrid LatentMoE architecture that combines Mamba-2, Mixture-of-Experts, and Attention layers, activating only 12 billion of its 120 billion total parameters per token. This design allows the model to handle demanding tasks while using significantly less compute than a dense model of comparable parameter count.

The model is built for agentic workflows, long-context reasoning, and high-throughput deployments. It supports a context window of up to 1 million tokens and achieves a RULER-100 retrieval score of 91.75 at that length. Nemotron 3 Super 120B also includes a configurable thinking mode for step-by-step reasoning, supports seven languages including English, French, German, Italian, Japanese, Spanish, and Chinese, and is available as an open-weight model suitable for both cloud API and self-hosted use.

Capabilities

What Nemotron 3 Super 120B supports

Long Context Window

Processes up to 1 million tokens of context in a single request, with a reported RULER-100 retrieval accuracy of 91.75 at that length.

Agentic Reasoning

Designed for multi-step autonomous workflows including coding agents, planning, and tool use, with benchmark results on SWE-Bench, Terminal Bench, and TauBench.

Configurable Thinking Mode

Supports an optional reasoning trace mode where the model generates step-by-step thinking before producing a final answer, useful for math and logic tasks.

Code Generation

Handles code writing, debugging, and autonomous software engineering tasks, with evaluation results on SWE-Bench and Terminal Bench.

Efficient MoE Inference

Activates only 12B of 120B total parameters per token using a LatentMoE architecture, reducing compute requirements compared to dense models at the same parameter scale.

Multilingual Support

Supports text generation in seven languages: English, French, German, Italian, Japanese, Spanish, and Chinese.

Tool Calling

Supports structured tool-use and function-calling workflows, making it suitable for RAG pipelines and multi-step agent integrations.

Instruction Following

Trained to follow complex, multi-part instructions and is evaluated on benchmarks including GPQA and HMMT for instruction-driven reasoning tasks.

Ready to build with Nemotron 3 Super 120B?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
AIME 2025	American math olympiad problems (2025)	90.2%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	82.7%
SWE-bench Verified	Real GitHub issues requiring multi-file code fixes	60.5%

FAQ

Common questions about Nemotron 3 Super 120B

What is the context window for Nemotron 3 Super 120B?

Nemotron 3 Super 120B supports a context window of up to 1 million tokens. NVIDIA reports a RULER-100 retrieval accuracy of 91.75 at the full 1M token length.

How many parameters does the model actually use during inference?

The model has 120 billion total parameters but activates only 12 billion per token during inference, thanks to its LatentMoE architecture combining Mamba-2, MoE, and Attention layers.

Is Nemotron 3 Super 120B open-weight?

Yes, Nemotron 3 Super 120B is released as an open-weight model. The model weights are available on Hugging Face, and NVIDIA updated the license after initial release to remove certain restrictive clauses that had drawn community concern.

What languages does Nemotron 3 Super 120B support?

The model supports seven languages: English, French, German, Italian, Japanese, Spanish, and Chinese.

What is the training data cutoff for this model?

Based on the available metadata, the model was released in March 2026. A specific training data cutoff date is not stated in the provided metadata; refer to the official technical report for details.

What hardware is Nemotron 3 Super 120B optimized for?

The model is designed with NVIDIA Blackwell architecture in mind. Community benchmarks have demonstrated NVFP4 inference running on a single RTX Pro 6000 Blackwell GPU.

Community Discussion

What people think about Nemotron 3 Super 120B

Community reception on r/LocalLLaMA and r/singularity has been generally positive, with users highlighting the model's 1M token context window, fast inference due to its 12B active parameter design, and suitability for local deployment on Blackwell hardware. The hybrid SSM LatentMoE architecture and open-weight availability were frequently cited as notable attributes.

A significant early concern was the model's original license, which contained clauses users described as restrictive; this was resolved when NVIDIA updated the license shortly after release. Discussions also noted the absence of vision capabilities as a trade-off compared to some contemporaries, and users debated use cases where long context and speed matter more than multimodal support.

r/LocalLLaMA 293 pts 80 comments

Nvidia updated the Nemotron Super 3 122B A12B license to remove the rug-pull clauses

r/LocalLLaMA 27 pts 50 comments

Qwen3.5 122b vs. Nemotron 3 Super 120b: Best-in-class vision Vs. crazy fast + 1M context (but no vision). Which one are you going to choose and why?

r/LocalLLaMA 88 pts 51 comments

Nemotron 3 Super release soon?

r/LocalLLaMA 58 pts 35 comments

Nemotron-3-Super-120B-A12B NVFP4 inference benchmark on one RTX Pro 6000 Blackwell

r/singularity 90 pts 13 comments

Nvidia Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell

View more discussions →

Resources