o1
Small cost-efficient reasoning model that’s optimized for coding, math, and science, and supports tools and Structured Outputs.
Reinforcement learning model for complex reasoning
OpenAI o1 is a large language model developed by OpenAI and trained using reinforcement learning to perform complex, multi-step reasoning. Unlike standard language models that respond immediately, o1 generates an internal chain of thought before producing its final answer, allowing it to work through difficult problems more systematically. It supports a 200,000-token context window, tool use, and Structured Outputs via the API.
The model is designed for tasks in coding, mathematics, and science where careful reasoning is more important than broad general knowledge. It has demonstrated notable benchmark results, including ranking in the 89th percentile on Codeforces competitive programming questions, placing among the top 500 students in the US on the AIME math qualifier, and exceeding human PhD-level accuracy on the GPQA benchmark covering physics, biology, and chemistry. It is well-suited for developers and researchers who need a model that can handle technically demanding problems within a large context.
What o1 supports
Chain-of-Thought Reasoning
Generates an internal chain of thought before responding, enabling systematic problem-solving across multi-step tasks. This reasoning process is produced automatically before each output.
Large Context Window
Supports up to 200,000 tokens of context, allowing long documents, codebases, or conversation histories to be processed in a single request.
Structured Outputs
Returns responses conforming to a specified JSON schema, making it straightforward to integrate model outputs into downstream applications.
Tool Use
Supports function calling and external tool integration, enabling the model to invoke developer-defined tools during a reasoning session.
Math & Science Tasks
Optimized for quantitative and scientific reasoning, with benchmark results including top-500 placement on the AIME qualifier and PhD-level accuracy on GPQA.
Code Generation
Handles complex programming tasks with documented performance at the 89th percentile on Codeforces competitive programming questions.
Ready to build with o1?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 84.1% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 74.7% |
| MATH-500 | Undergraduate and competition-level math problems | 97.0% |
| AIME 2024 | American math olympiad problems | 72.3% |
| LiveCodeBench | Real-world coding tasks from recent competitions | 67.9% |
| HLE | Questions that challenge frontier models across many domains | 7.7% |
| SciCode | Scientific research coding and numerical methods | 35.8% |
Common questions about o1
What is the context window for o1?
The o1 model supports a context window of 200,000 tokens, allowing large volumes of text, code, or documents to be included in a single request.
What is the training data cutoff for o1?
Based on the available metadata, o1's training data has a cutoff of late 2023.
What types of tasks is o1 best suited for?
o1 is optimized for coding, mathematics, and science tasks that require complex, multi-step reasoning. It is particularly useful when the problem demands careful logical analysis rather than broad general knowledge.
Does o1 support tool use and structured outputs?
Yes. o1 supports both tool use (function calling) and Structured Outputs, which allows responses to conform to a developer-specified JSON schema.
How does o1 differ from standard OpenAI text generation models?
o1 is trained with reinforcement learning specifically to reason before responding. It produces an internal chain of thought prior to generating its final answer, which is distinct from models that respond without an explicit intermediate reasoning step.
What people think about o1
Community discussions around o1 frequently reference its role in OpenAI's broader model lineage, with some researchers noting that o1 and o3 represented a significant capability milestone that informed later model naming decisions. Developers have also discussed the API availability of o1 Pro alongside GPT-4.5, with some questioning the positioning and pricing strategy.
A recurring concern in threads is how o1 fits into a rapidly evolving competitive landscape, with comparisons drawn to models from other organizations on reasoning benchmarks. Practical use cases mentioned include competitive programming, scientific problem-solving, and tasks requiring structured, multi-step outputs.
OpenAI's post-training lead leaves and joins Anthropic: he helped ship GPT-5, 5.1, 5.2, 5.3-Codex, o3 and o1 and will return to hands-on RL research at Anthropic
OpenAI released GPT-4.5 and O1 Pro via their API and it looks like a weird decision.
OpenAI Researcher: O1/O3 were undeniably GPT-5 level and it just took us time to have confidence to bump the name.
MASSIVE release from China Baidu - Ernie 4.5 VLMs & LLMs, Models beat DeepSeek v3, Qwen 235B and competitive to OpenAI O1 - Apache 2.0
Qwen 3 !!!
Parameters & options
Used to give the model guidance on how many reasoning tokens it should generate before creating a response to the prompt. Low will favor speed and economical token usage, and high will favor more complete reasoning at the cost of more tokens generated and slower responses. The default value is medium, which is a balance between speed and reasoning accuracy.
Explore similar models
Start building with o1
No API keys required. Create AI-powered workflows with o1 in minutes — free.