GPT OSS 120B
OpenAI's flagship open-weight reasoning model with 117B parameters, built for powerful coding, math, and agentic tasks under the Apache 2.0 license.
OpenAI's open-weight reasoning model for code and math
GPT OSS 120B is OpenAI's largest open-weight model, released in August 2025 under the Apache 2.0 license. It has approximately 116.8 billion total parameters and uses a Mixture-of-Experts (MoE) architecture that activates only around 5.1 billion parameters per token, enabling efficient inference on a single H100 GPU. The model is part of the GPT OSS family and is designed for commercial and private deployments without licensing restrictions.
The model is built for coding, mathematical reasoning, scientific analysis, and agentic workflows. It supports a 128,000-token context window, adjustable reasoning levels (low, medium, and high), and native tool use including web browsing, Python code execution, and custom developer-defined functions. Architecturally, it uses 36 transformer layers with 128 experts per MoE layer (top 4 active per token), Grouped Query Attention, Rotary Position Embeddings, and an alternating local/dense attention pattern, and it is available for local inference via Hugging Face Transformers, llama.cpp, and vLLM.
What GPT OSS 120B supports
Mixture-of-Experts Architecture
Uses a MoE design with 128 experts per layer, activating only ~5.1 billion of 116.8 billion total parameters per token for efficient inference.
Adjustable Reasoning
Supports low, medium, and high reasoning levels, allowing developers to tune the trade-off between response speed and reasoning depth.
Long Context Window
Handles up to 128,000 tokens per request, equivalent to roughly 100,000 words of text in a single prompt.
Coding and Math
Designed for software development, mathematical reasoning, and scientific analysis tasks requiring multi-step problem solving.
Tool Use
Natively supports web browsing, Python code execution, and custom developer-defined functions as callable tools.
Agentic Workflows
Built for multi-step agentic tasks and integrates with agent frameworks, supporting complex sequences of tool calls and decisions.
Open Source License
Released under the Apache 2.0 license, permitting commercial use, fine-tuning, and private deployment without royalty obligations.
Fast Inference
Tagged as very fast; the MoE architecture keeps active parameter count low, and the model fits on a single H100 GPU for local deployment.
Fine-Tuning Support
Supports fine-tuning workflows, allowing developers to adapt the base model to domain-specific tasks using standard training pipelines.
Ready to build with GPT OSS 120B?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 80.8% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 78.2% |
| LiveCodeBench | Real-world coding tasks from recent competitions | 87.8% |
| HLE | Questions that challenge frontier models across many domains | 18.5% |
| SciCode | Scientific research coding and numerical methods | 38.9% |
Common questions about GPT OSS 120B
What is the context window for GPT OSS 120B?
GPT OSS 120B supports a 128,000-token context window, which is roughly equivalent to 100,000 words of text in a single request.
What license does GPT OSS 120B use?
The model is released under the Apache 2.0 license, which permits commercial use, modification, fine-tuning, and private deployment.
What is the training data cutoff for GPT OSS 120B?
Based on the available metadata, the model was released in August 2025. A specific training data cutoff date is not stated in the provided metadata.
How many parameters does GPT OSS 120B have, and how does the MoE architecture affect inference?
The model has approximately 116.8 billion total parameters, but its Mixture-of-Experts architecture activates only around 5.1 billion parameters per token during inference, reducing compute requirements compared to a dense model of the same total size.
Where can GPT OSS 120B be deployed?
The model is available on AWS via Amazon Bedrock and SageMaker JumpStart, on NVIDIA NIM, and locally through Hugging Face Transformers, llama.cpp, and vLLM. It fits on a single H100 GPU for local inference.
Does GPT OSS 120B support tool use and agentic tasks?
Yes. The model natively supports web browsing, Python code execution, and custom developer-defined functions, and it is designed for multi-step agentic workflows and integration with agent frameworks.
What people think about GPT OSS 120B
Community reception on r/LocalLLaMA has been largely positive, with the release announcement drawing over 2,000 upvotes and 551 comments, making it one of the more discussed open-weight model launches in the community. Users have praised the model's coding and reasoning capabilities, with one thread titled "OpenAI GPT-OSS-120b is an excellent model" accumulating 202 upvotes and 146 comments.
Some community members have raised concerns about benchmark performance, particularly on Simple-Bench, where results were described as disappointing in a dedicated thread. Creative writing and EQ-Bench results were also discussed separately, suggesting the community is actively evaluating the model across a range of tasks beyond coding and math.
openai/gpt-oss-120b · Hugging Face
OpenAI GPT-OSS-120b is an excellent model
OpenAI gpt-oss-120b & 20b EQ-Bench & creative writing results
GPT-OSS 120B Simple-Bench is not looking great either. What is going on Openai?
🚀 OpenAI released their open-weight models!!!
Documentation & links
Parameters & options
Explore similar models
Start building with GPT OSS 120B
No API keys required. Create AI-powered workflows with GPT OSS 120B in minutes — free.