Text Generation Model

GPT OSS 20B

OpenAI's compact open-weight reasoning model that delivers powerful AI capabilities on consumer hardware, running within just 16GB of memory.

Start Building with GPT OSS 20B View All Models

Publisher

OpenAI

Type Text

Context Window 128,000 tokens

Training Data August 2025

Input $0.10/MTok

Output $0.50/MTok

Provider

Groq

OPEN SOURCEVERY FAST

Try GPT OSS 20B →

About GPT OSS 20B

Open-weight reasoning model for local deployment

GPT OSS 20B is an open-weight text generation model released by OpenAI in August 2025, representing the company's first open-weight release since GPT-2 in 2019. It uses a Mixture-of-Experts (MoE) architecture with 21 billion total parameters, activating approximately 3.6 billion parameters per token across 4 of 32 experts in 24 layers. Combined with MXFP4 4-bit quantization, the model runs within 16GB of memory, making it suitable for consumer hardware and on-device deployment. It is licensed under Apache 2.0, allowing local hosting, firewall-protected deployment, and fine-tuning for custom use cases.

GPT OSS 20B supports a 128,000-token context window and includes adjustable reasoning levels — low, medium, and high — with chain-of-thought traces. Its documented strengths include coding, mathematical reasoning, and scientific analysis, along with tool use and agentic workflow support. The model also produces structured outputs for predictable, schema-conforming responses. It is available through Hugging Face, Amazon SageMaker, Amazon Bedrock, and NVIDIA NIM, and is well-suited for developers and organizations that require a self-hosted, customizable AI model without relying on cloud infrastructure.

Capabilities

What GPT OSS 20B supports

Adjustable Reasoning

Supports low, medium, and high reasoning levels with chain-of-thought traces, giving developers control over the depth of reasoning applied per request.

Long Context Window

Handles up to 128,000 tokens per request, enabling processing of long documents, codebases, or extended multi-turn conversations in a single pass.

Coding and Math

Documented core strengths include code generation, mathematical reasoning, and scientific analysis tasks.

Tool Use and Agents

Supports tool calling and agentic workflows, allowing the model to interact with external functions and APIs as part of multi-step tasks.

Structured Outputs

Produces structured, schema-conforming responses for use cases that require predictable output formats such as JSON.

On-Device Efficiency

Uses MXFP4 4-bit quantization and a MoE architecture to run within 16GB of memory, making local deployment on consumer hardware feasible.

Open-Weight License

Released under Apache 2.0, allowing self-hosting, fine-tuning, and deployment behind firewalls without usage restrictions from the license.

Fine-Tuning Support

Supports fine-tuning for custom use cases, with documented integration via Hugging Face libraries on Amazon SageMaker.

Ready to build with GPT OSS 20B?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	74.8%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	68.8%
LiveCodeBench	Real-world coding tasks from recent competitions	77.7%
HLE	Questions that challenge frontier models across many domains	9.8%
SciCode	Scientific research coding and numerical methods	34.4%

FAQ

Common questions about GPT OSS 20B

What is the context window for GPT OSS 20B?

GPT OSS 20B supports a context window of 128,000 tokens, allowing it to process long documents, extended conversations, or large codebases in a single request.

What are the hardware requirements to run GPT OSS 20B locally?

Due to its MoE architecture and MXFP4 4-bit quantization, GPT OSS 20B can run within 16GB of memory, making it compatible with consumer-grade hardware without requiring a high-end GPU.

What license does GPT OSS 20B use?

GPT OSS 20B is released under the Apache 2.0 license, which permits local deployment, fine-tuning, and use behind firewalls for both personal and commercial purposes.

What is the training data cutoff for GPT OSS 20B?

The model was released in August 2025. A specific training data cutoff date is not listed in the available metadata; refer to the official model card on Hugging Face for the most accurate information.

What platforms can I use to deploy GPT OSS 20B?

GPT OSS 20B is available through Hugging Face, Amazon SageMaker, Amazon Bedrock, and NVIDIA NIM, in addition to local self-hosting using the open-weight model files.

Does GPT OSS 20B support fine-tuning?

Yes, GPT OSS 20B supports fine-tuning. AWS has published a guide for fine-tuning the model on Amazon SageMaker using Hugging Face libraries, and the Apache 2.0 license permits custom fine-tuning workflows.

Community Discussion

What people think about GPT OSS 20B

Community reception on r/LocalLLaMA has been notably positive, with the open-weight release generating significant discussion and over 2,000 upvotes on the announcement thread. Users have praised the model's ability to run on consumer hardware, including older CPUs without dedicated NVIDIA GPUs.

A recurring theme in community threads is the model's efficiency on low-resource hardware, with users reporting usable inference speeds on machines as modest as an 8th-gen Intel i3. Some threads focus on benchmarking performance across specific GPU configurations, such as the RTX Pro 6000 Blackwell and RTX 5090M.

r/LocalLLaMA 2,021 pts 551 comments

🚀 OpenAI released their open-weight models!!!

r/LocalLLaMA 82 pts 44 comments

OpenAI gpt-oss-20b & 120 model performance on the RTX Pro 6000 Blackwell vs RTX 5090M

r/LocalLLaMA 1,206 pts 136 comments

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE.

View more discussions →

Resources