Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

GPT OSS 20B

OpenAI's compact open-weight reasoning model that delivers powerful AI capabilities on consumer hardware, running within just 16GB of memory.

Publisher OpenAI
Type Text
Context Window 128,000 tokens
Training Data August 2025
Input $0.10/MTok
Output $0.50/MTok
Provider Groq
OPEN SOURCEVERY FAST

Open-weight reasoning model for local deployment

GPT OSS 20B is an open-weight text generation model released by OpenAI in August 2025, representing the company's first open-weight release since GPT-2 in 2019. It uses a Mixture-of-Experts (MoE) architecture with 21 billion total parameters, activating approximately 3.6 billion parameters per token across 4 of 32 experts in 24 layers. Combined with MXFP4 4-bit quantization, the model runs within 16GB of memory, making it suitable for consumer hardware and on-device deployment. It is licensed under Apache 2.0, allowing local hosting, firewall-protected deployment, and fine-tuning for custom use cases.

GPT OSS 20B supports a 128,000-token context window and includes adjustable reasoning levels — low, medium, and high — with chain-of-thought traces. Its documented strengths include coding, mathematical reasoning, and scientific analysis, along with tool use and agentic workflow support. The model also produces structured outputs for predictable, schema-conforming responses. It is available through Hugging Face, Amazon SageMaker, Amazon Bedrock, and NVIDIA NIM, and is well-suited for developers and organizations that require a self-hosted, customizable AI model without relying on cloud infrastructure.

What GPT OSS 20B supports

Adjustable Reasoning

Supports low, medium, and high reasoning levels with chain-of-thought traces, giving developers control over the depth of reasoning applied per request.

Long Context Window

Handles up to 128,000 tokens per request, enabling processing of long documents, codebases, or extended multi-turn conversations in a single pass.

Coding and Math

Documented core strengths include code generation, mathematical reasoning, and scientific analysis tasks.

Tool Use and Agents

Supports tool calling and agentic workflows, allowing the model to interact with external functions and APIs as part of multi-step tasks.

Structured Outputs

Produces structured, schema-conforming responses for use cases that require predictable output formats such as JSON.

On-Device Efficiency

Uses MXFP4 4-bit quantization and a MoE architecture to run within 16GB of memory, making local deployment on consumer hardware feasible.

Open-Weight License

Released under Apache 2.0, allowing self-hosting, fine-tuning, and deployment behind firewalls without usage restrictions from the license.

Fine-Tuning Support

Supports fine-tuning for custom use cases, with documented integration via Hugging Face libraries on Amazon SageMaker.

Ready to build with GPT OSS 20B?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 74.8%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 68.8%
LiveCodeBench Real-world coding tasks from recent competitions 77.7%
HLE Questions that challenge frontier models across many domains 9.8%
SciCode Scientific research coding and numerical methods 34.4%

Common questions about GPT OSS 20B

What is the context window for GPT OSS 20B?

GPT OSS 20B supports a context window of 128,000 tokens, allowing it to process long documents, extended conversations, or large codebases in a single request.

What are the hardware requirements to run GPT OSS 20B locally?

Due to its MoE architecture and MXFP4 4-bit quantization, GPT OSS 20B can run within 16GB of memory, making it compatible with consumer-grade hardware without requiring a high-end GPU.

What license does GPT OSS 20B use?

GPT OSS 20B is released under the Apache 2.0 license, which permits local deployment, fine-tuning, and use behind firewalls for both personal and commercial purposes.

What is the training data cutoff for GPT OSS 20B?

The model was released in August 2025. A specific training data cutoff date is not listed in the available metadata; refer to the official model card on Hugging Face for the most accurate information.

What platforms can I use to deploy GPT OSS 20B?

GPT OSS 20B is available through Hugging Face, Amazon SageMaker, Amazon Bedrock, and NVIDIA NIM, in addition to local self-hosting using the open-weight model files.

Does GPT OSS 20B support fine-tuning?

Yes, GPT OSS 20B supports fine-tuning. AWS has published a guide for fine-tuning the model on Amazon SageMaker using Hugging Face libraries, and the Apache 2.0 license permits custom fine-tuning workflows.

What people think about GPT OSS 20B

Community reception on r/LocalLLaMA has been notably positive, with the open-weight release generating significant discussion and over 2,000 upvotes on the announcement thread. Users have praised the model's ability to run on consumer hardware, including older CPUs without dedicated NVIDIA GPUs.

A recurring theme in community threads is the model's efficiency on low-resource hardware, with users reporting usable inference speeds on machines as modest as an 8th-gen Intel i3. Some threads focus on benchmarking performance across specific GPU configurations, such as the RTX Pro 6000 Blackwell and RTX 5090M.

View more discussions →

Parameters & options

Max Temperature 2
Max Response Size 32,768 tokens

Start building with GPT OSS 20B

No API keys required. Create AI-powered workflows with GPT OSS 20B in minutes — free.