Nemotron 3 Super 120B
NVIDIA's open-weight hybrid Mamba-Transformer MoE with 120B total / 12B active parameters, excelling in agentic reasoning, coding, and long-context tasks up to 1M tokens.
Hybrid MoE with 1M token context window
Nemotron 3 Super 120B is an open-weight large language model released by NVIDIA in March 2026. It uses a hybrid LatentMoE architecture that combines Mamba-2, Mixture-of-Experts, and Attention layers, activating only 12 billion of its 120 billion total parameters per token. This design allows the model to handle demanding tasks while using significantly less compute than a dense model of comparable parameter count.
The model is built for agentic workflows, long-context reasoning, and high-throughput deployments. It supports a context window of up to 1 million tokens and achieves a RULER-100 retrieval score of 91.75 at that length. Nemotron 3 Super 120B also includes a configurable thinking mode for step-by-step reasoning, supports seven languages including English, French, German, Italian, Japanese, Spanish, and Chinese, and is available as an open-weight model suitable for both cloud API and self-hosted use.
What Nemotron 3 Super 120B supports
Long Context Window
Processes up to 1 million tokens of context in a single request, with a reported RULER-100 retrieval accuracy of 91.75 at that length.
Agentic Reasoning
Designed for multi-step autonomous workflows including coding agents, planning, and tool use, with benchmark results on SWE-Bench, Terminal Bench, and TauBench.
Configurable Thinking Mode
Supports an optional reasoning trace mode where the model generates step-by-step thinking before producing a final answer, useful for math and logic tasks.
Code Generation
Handles code writing, debugging, and autonomous software engineering tasks, with evaluation results on SWE-Bench and Terminal Bench.
Efficient MoE Inference
Activates only 12B of 120B total parameters per token using a LatentMoE architecture, reducing compute requirements compared to dense models at the same parameter scale.
Multilingual Support
Supports text generation in seven languages: English, French, German, Italian, Japanese, Spanish, and Chinese.
Tool Calling
Supports structured tool-use and function-calling workflows, making it suitable for RAG pipelines and multi-step agent integrations.
Instruction Following
Trained to follow complex, multi-part instructions and is evaluated on benchmarks including GPQA and HMMT for instruction-driven reasoning tasks.
Ready to build with Nemotron 3 Super 120B?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| AIME 2025 | American math olympiad problems (2025) | 90.2% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 82.7% |
| SWE-bench Verified | Real GitHub issues requiring multi-file code fixes | 60.5% |
Common questions about Nemotron 3 Super 120B
What is the context window for Nemotron 3 Super 120B?
Nemotron 3 Super 120B supports a context window of up to 1 million tokens. NVIDIA reports a RULER-100 retrieval accuracy of 91.75 at the full 1M token length.
How many parameters does the model actually use during inference?
The model has 120 billion total parameters but activates only 12 billion per token during inference, thanks to its LatentMoE architecture combining Mamba-2, MoE, and Attention layers.
Is Nemotron 3 Super 120B open-weight?
Yes, Nemotron 3 Super 120B is released as an open-weight model. The model weights are available on Hugging Face, and NVIDIA updated the license after initial release to remove certain restrictive clauses that had drawn community concern.
What languages does Nemotron 3 Super 120B support?
The model supports seven languages: English, French, German, Italian, Japanese, Spanish, and Chinese.
What is the training data cutoff for this model?
Based on the available metadata, the model was released in March 2026. A specific training data cutoff date is not stated in the provided metadata; refer to the official technical report for details.
What hardware is Nemotron 3 Super 120B optimized for?
The model is designed with NVIDIA Blackwell architecture in mind. Community benchmarks have demonstrated NVFP4 inference running on a single RTX Pro 6000 Blackwell GPU.
What people think about Nemotron 3 Super 120B
Community reception on r/LocalLLaMA and r/singularity has been generally positive, with users highlighting the model's 1M token context window, fast inference due to its 12B active parameter design, and suitability for local deployment on Blackwell hardware. The hybrid SSM LatentMoE architecture and open-weight availability were frequently cited as notable attributes.
A significant early concern was the model's original license, which contained clauses users described as restrictive; this was resolved when NVIDIA updated the license shortly after release. Discussions also noted the absence of vision capabilities as a trade-off compared to some contemporaries, and users debated use cases where long context and speed matter more than multimodal support.
Nvidia updated the Nemotron Super 3 122B A12B license to remove the rug-pull clauses
Qwen3.5 122b vs. Nemotron 3 Super 120b: Best-in-class vision Vs. crazy fast + 1M context (but no vision). Which one are you going to choose and why?
Nemotron 3 Super release soon?
Nemotron-3-Super-120B-A12B NVFP4 inference benchmark on one RTX Pro 6000 Blackwell
Nvidia Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell
Documentation & links
Parameters & options
Explore similar models
Start building with Nemotron 3 Super 120B
No API keys required. Create AI-powered workflows with Nemotron 3 Super 120B in minutes — free.