Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Nemotron 3 Nano 30B

NVIDIA's compact open-weight hybrid MoE model with 30B total / 3.5B active parameters, delivering strong reasoning and coding performance up to 1M context.

Publisher Nvidia
Type Text
Context Window 1,000,000 tokens
Training Data December 2025
Input $0.05/MTok
Output $0.20/MTok
Provider DeepInfra

Hybrid MoE reasoning with 1M token context

Nemotron 3 Nano 30B is an open-weight text generation model released by NVIDIA in December 2025 as part of the Nemotron 3 family. It uses a hybrid architecture combining 23 Mamba-2 layers, 23 Mixture-of-Experts (MoE) layers, and 6 Attention layers, with 30B total parameters but only 3.5B active per token. This design allows the model to handle complex tasks while using significantly less compute than a comparable dense model. It supports six languages: English, German, Spanish, French, Italian, and Japanese.

The model supports a context window of up to 1 million tokens, making it well-suited for long-document processing, retrieval-augmented generation (RAG), and agentic workflows. On math benchmarks it scores 89.1% on AIME25 without tools and 99.2% with tools, and it achieves 68.3% on LiveCodeBench and 38.8% on SWE-Bench for coding tasks. Its combination of low active-parameter count and long-context capability makes it a practical choice for high-volume or cost-sensitive deployments, edge agents, and instruction-following applications where compute efficiency matters.

What Nemotron 3 Nano 30B supports

Long Context Window

Processes up to 1 million tokens in a single context, with near-perfect retrieval scoring 86.3 on RULER-100 at the 1M token length.

Math Reasoning

Handles complex mathematical problems, scoring 89.1% on AIME25 without tools and 99.2% with tool use.

Code Generation

Generates and evaluates code across benchmarks, achieving 68.3% on LiveCodeBench and 38.8% on SWE-Bench.

Agentic Tool Use

Supports multi-step agentic tasks and tool calling, scoring 49.0 on TauBench and 53.8 on BFCL v4.

Hybrid MoE Architecture

Activates only 3.5B of 30B total parameters per token using a Mamba-2 and MoE hybrid design, reducing compute per inference pass.

Multilingual Support

Supports text generation in six languages: English, German, Spanish, French, Italian, and Japanese.

Instruction Following

Trained for instruction-following tasks, making it suitable for chat, RAG pipelines, and structured task completion.

Ready to build with Nemotron 3 Nano 30B?

Get Started Free

Common questions about Nemotron 3 Nano 30B

What is the context window size for Nemotron 3 Nano 30B?

Nemotron 3 Nano 30B supports a context window of up to 1 million tokens, and the model achieves a RULER-100 score of 86.3 at that length, indicating strong long-range retrieval performance.

How many parameters does this model actually use during inference?

Although the model has 30B total parameters, its Mixture-of-Experts architecture activates only 3.5B parameters per token during inference, which reduces the compute cost compared to a dense 30B model.

What is the training data cutoff for Nemotron 3 Nano 30B?

The model was released in December 2025. The exact training data cutoff date is not specified in the available metadata; refer to the official technical report for details.

Is Nemotron 3 Nano 30B open-weight?

Yes, Nemotron 3 Nano 30B is an open-weight model. The BF16 weights are available on Hugging Face under the repository nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.

What languages does Nemotron 3 Nano 30B support?

The model supports six languages: English, German, Spanish, French, Italian, and Japanese.

What types of tasks is this model best suited for?

Based on its benchmark results and architecture, the model is well-suited for math reasoning, code generation, long-document processing, RAG systems, agentic tool-calling workflows, and instruction-following tasks where compute efficiency is a priority.

What people think about Nemotron 3 Nano 30B

Community reception on r/LocalLLaMA was largely positive, with the release announcement gathering over 850 upvotes and 180 comments, reflecting strong interest in the model's hybrid MoE architecture and its 1M token context support. Users highlighted the efficiency of the 3.5B active parameter design and the strong math and coding benchmark scores as notable characteristics.

Some community members focused on practical local deployment, with a dedicated thread benchmarking the model using Vulkan and RPC backends to assess real-world performance on consumer hardware. Concerns and discussion points included inference speed on non-NVIDIA hardware and how the model performs outside of benchmark conditions.

View more discussions →

Parameters & options

Max Temperature 1
Max Response Size 16,384 tokens
Reasoning Select

When enabled, the model generates a reasoning trace before providing a final answer. This improves performance on complex tasks like math, coding, and multi-step reasoning, but results in longer responses and higher token usage.

Default: false
DisabledEnabled

Start building with Nemotron 3 Nano 30B

No API keys required. Create AI-powered workflows with Nemotron 3 Nano 30B in minutes — free.