GPT OSS 20B
OpenAI's compact open-weight reasoning model that delivers powerful AI capabilities on consumer hardware, running within just 16GB of memory.
Open-weight reasoning model for local deployment
GPT OSS 20B is an open-weight text generation model released by OpenAI in August 2025, representing the company's first open-weight release since GPT-2 in 2019. It uses a Mixture-of-Experts (MoE) architecture with 21 billion total parameters, activating approximately 3.6 billion parameters per token across 4 of 32 experts in 24 layers. Combined with MXFP4 4-bit quantization, the model runs within 16GB of memory, making it suitable for consumer hardware and on-device deployment. It is licensed under Apache 2.0, allowing local hosting, firewall-protected deployment, and fine-tuning for custom use cases.
GPT OSS 20B supports a 128,000-token context window and includes adjustable reasoning levels — low, medium, and high — with chain-of-thought traces. Its documented strengths include coding, mathematical reasoning, and scientific analysis, along with tool use and agentic workflow support. The model also produces structured outputs for predictable, schema-conforming responses. It is available through Hugging Face, Amazon SageMaker, Amazon Bedrock, and NVIDIA NIM, and is well-suited for developers and organizations that require a self-hosted, customizable AI model without relying on cloud infrastructure.
What GPT OSS 20B supports
Adjustable Reasoning
Supports low, medium, and high reasoning levels with chain-of-thought traces, giving developers control over the depth of reasoning applied per request.
Long Context Window
Handles up to 128,000 tokens per request, enabling processing of long documents, codebases, or extended multi-turn conversations in a single pass.
Coding and Math
Documented core strengths include code generation, mathematical reasoning, and scientific analysis tasks.
Tool Use and Agents
Supports tool calling and agentic workflows, allowing the model to interact with external functions and APIs as part of multi-step tasks.
Structured Outputs
Produces structured, schema-conforming responses for use cases that require predictable output formats such as JSON.
On-Device Efficiency
Uses MXFP4 4-bit quantization and a MoE architecture to run within 16GB of memory, making local deployment on consumer hardware feasible.
Open-Weight License
Released under Apache 2.0, allowing self-hosting, fine-tuning, and deployment behind firewalls without usage restrictions from the license.
Fine-Tuning Support
Supports fine-tuning for custom use cases, with documented integration via Hugging Face libraries on Amazon SageMaker.
Ready to build with GPT OSS 20B?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 74.8% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 68.8% |
| LiveCodeBench | Real-world coding tasks from recent competitions | 77.7% |
| HLE | Questions that challenge frontier models across many domains | 9.8% |
| SciCode | Scientific research coding and numerical methods | 34.4% |
Common questions about GPT OSS 20B
What is the context window for GPT OSS 20B?
GPT OSS 20B supports a context window of 128,000 tokens, allowing it to process long documents, extended conversations, or large codebases in a single request.
What are the hardware requirements to run GPT OSS 20B locally?
Due to its MoE architecture and MXFP4 4-bit quantization, GPT OSS 20B can run within 16GB of memory, making it compatible with consumer-grade hardware without requiring a high-end GPU.
What license does GPT OSS 20B use?
GPT OSS 20B is released under the Apache 2.0 license, which permits local deployment, fine-tuning, and use behind firewalls for both personal and commercial purposes.
What is the training data cutoff for GPT OSS 20B?
The model was released in August 2025. A specific training data cutoff date is not listed in the available metadata; refer to the official model card on Hugging Face for the most accurate information.
What platforms can I use to deploy GPT OSS 20B?
GPT OSS 20B is available through Hugging Face, Amazon SageMaker, Amazon Bedrock, and NVIDIA NIM, in addition to local self-hosting using the open-weight model files.
Does GPT OSS 20B support fine-tuning?
Yes, GPT OSS 20B supports fine-tuning. AWS has published a guide for fine-tuning the model on Amazon SageMaker using Hugging Face libraries, and the Apache 2.0 license permits custom fine-tuning workflows.
What people think about GPT OSS 20B
Community reception on r/LocalLLaMA has been notably positive, with the open-weight release generating significant discussion and over 2,000 upvotes on the announcement thread. Users have praised the model's ability to run on consumer hardware, including older CPUs without dedicated NVIDIA GPUs.
A recurring theme in community threads is the model's efficiency on low-resource hardware, with users reporting usable inference speeds on machines as modest as an 8th-gen Intel i3. Some threads focus on benchmarking performance across specific GPU configurations, such as the RTX Pro 6000 Blackwell and RTX 5090M.
🚀 OpenAI released their open-weight models!!!
OpenAI gpt-oss-20b & 120 model performance on the RTX Pro 6000 Blackwell vs RTX 5090M
No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE.
Documentation & links
Parameters & options
Explore similar models
Start building with GPT OSS 20B
No API keys required. Create AI-powered workflows with GPT OSS 20B in minutes — free.